[libvoikko] Moving support for HFST spellers from experimental to stable
Harri Pitkänen
hatapitk at iki.fi
Mon Jan 7 21:55:57 EET 2013
On Monday 07 January 2013, Flammie Pirinen wrote:
> I think the version of hfst-ospell command-line in the repository
> should read and print the XML metadata when called with --verbose, so
> the needed code is there, at least I remember implementing it once.
Yes, but it uses the metadata_dump() method which combines all of the metadata
into one string:
if (verbose)
{
std::cout << "Following metadata was read from ZHFST archive:" <<
std::endl
<< speller.metadata_dump() << std::endl;
}
Libvoikko would need the locale and some short (one line) description of the
speller (written preferably in the target language). I could not find any easy
way to read these. Implementation of ZHfstOspellerXmlMetadata::debug_dump()
seems to read the description from many different XML elements and I'm not
sure if all of the methods used there can be used from external code.
> I think it should be doable. Assuming the files in voikko/3/ are
> language coded as speller-...zhfst, the only missing piece is to have
> the language code parsing, or just enumerating all zhfst files,
> currently I think the code just uses hard-coded speller.zhfst in 2/
> dirs instead (and the pro file parsing).
Encoding the locale into the file name is fine (I think) but then we would
still need the description (this is used at least by "voikkospell -l",
Webvoikko and libreoffice-voikko):
$ voikkospell -l
...
fi-x-standard: suomi (perussanasto)
fi-x-ovfst: Omorfi-pohjainen VFST-morfologia
fi-x-apertium: Experimental Lttoolbox morphology
fi-x-dialect: murteellisten, vanhojen ja harvinaisten sanojen sanasto
fi-x-hfst: Kokeellinen HFST-morfologia
fi-x-malmor: suomi (perussanasto)
fi-x-malstd: suomi (perussanasto)
fi-x-medicine: matematiikan, fysiikan, kemian ja lääketieteen sanastot
...
Of course we could just use a semi-constant description like "HFST speller for
$LOCALE". That would be good enough for most uses.
Harri
More information about the Libvoikko
mailing list