[libvoikko] Moving support for HFST spellers from experimental to stable

Harri Pitkänen hatapitk at iki.fi
Mon Jan 7 21:55:57 EET 2013


On Monday 07 January 2013, Flammie Pirinen wrote:
> I think the version of hfst-ospell command-line in the repository
> should read and print the XML metadata when called with --verbose, so
> the needed code is there, at least I remember implementing it once.

Yes, but it uses the metadata_dump() method which combines all of the metadata 
into one string:

  if (verbose)
    {
      std::cout << "Following metadata was read from ZHFST archive:" << 
std::endl
                << speller.metadata_dump() << std::endl;
    }

Libvoikko would need the locale and some short (one line) description of the 
speller (written preferably in the target language). I could not find any easy 
way to read these. Implementation of ZHfstOspellerXmlMetadata::debug_dump() 
seems to read the description from many different XML elements and I'm not 
sure if all of the methods used there can be used from external code.

> I think it should be doable. Assuming the files in voikko/3/ are
> language coded as speller-...zhfst, the only missing piece is to have
> the language code parsing, or just enumerating all zhfst files,
> currently I think the code just uses hard-coded speller.zhfst in 2/
> dirs instead (and the pro file parsing).

Encoding the locale into the file name is fine (I think) but then we would 
still need the description (this is used at least by "voikkospell -l", 
Webvoikko and libreoffice-voikko):

$ voikkospell -l
...
fi-x-standard: suomi (perussanasto)
fi-x-ovfst: Omorfi-pohjainen VFST-morfologia
fi-x-apertium: Experimental Lttoolbox morphology
fi-x-dialect: murteellisten, vanhojen ja harvinaisten sanojen sanasto
fi-x-hfst: Kokeellinen HFST-morfologia
fi-x-malmor: suomi (perussanasto)
fi-x-malstd: suomi (perussanasto)
fi-x-medicine: matematiikan, fysiikan, kemian ja lääketieteen sanastot
...

Of course we could just use a semi-constant description like "HFST speller for 
$LOCALE". That would be good enough for most uses.

Harri



More information about the Libvoikko mailing list