[libvoikko] Moving support for HFST spellers from experimental to stable

Flammie Pirinen flammie at iki.fi
Tue Jan 8 08:40:27 EET 2013


2013-01-07, Harri Pitkänen sanoi:

> On Monday 07 January 2013, Flammie Pirinen wrote:
> > I think the version of hfst-ospell command-line in the repository
> > should read and print the XML metadata when called with --verbose,
> > so the needed code is there, at least I remember implementing it
> > once.
> 
> Yes, but it uses the metadata_dump() method which combines all of the
> metadata into one string:
> 
>   if (verbose)
>     {
>       std::cout << "Following metadata was read from ZHFST archive:"
> << std::endl
>                 << speller.metadata_dump() << std::endl;
>     }

Ok, now I think I understand.
 
> Libvoikko would need the locale and some short (one line) description
> of the speller (written preferably in the target language). I could
> not find any easy way to read these. Implementation of
> ZHfstOspellerXmlMetadata::debug_dump() seems to read the description
> from many different XML elements and I'm not sure if all of the
> methods used there can be used from external code.

I think it makes sense to just have getter for the metadata
element for this in the ZHfstOspeller, after that it's just simple data
structures replicating the XML structure. So the locale should be at

speller.get_metadata().info_.locale_

and the title at

speller.get_metadata().info_.title_["LL"]

where LL is the locale matching the lang attribute of title in
metadata. title_ is map<string,string> of the titles along langs. I
believe the obligatory title is the one matching the locale so there's
always guaranteed to be that one, but perhaps the first one presented
to end user should be one matching their locale settings if available.

I commited this to svn now. We could also use getters and setters but I
don't see it giving much value here.

> > I think it should be doable. Assuming the files in voikko/3/ are
> > language coded as speller-...zhfst, the only missing piece is to
> > have the language code parsing, or just enumerating all zhfst files,
> > currently I think the code just uses hard-coded speller.zhfst in 2/
> > dirs instead (and the pro file parsing).
> 
> Encoding the locale into the file name is fine (I think) but then we
> would still need the description (this is used at least by
> "voikkospell -l", Webvoikko and libreoffice-voikko):
> 
> $ voikkospell -l
> ...
> fi-x-standard: suomi (perussanasto)
> fi-x-ovfst: Omorfi-pohjainen VFST-morfologia
> fi-x-apertium: Experimental Lttoolbox morphology
> fi-x-dialect: murteellisten, vanhojen ja harvinaisten sanojen sanasto
> fi-x-hfst: Kokeellinen HFST-morfologia
> fi-x-malmor: suomi (perussanasto)
> fi-x-malstd: suomi (perussanasto)
> fi-x-medicine: matematiikan, fysiikan, kemian ja lääketieteen sanastot
> ...
> 
> Of course we could just use a semi-constant description like "HFST
> speller for $LOCALE". That would be good enough for most uses.

I think it will be enough for use cases before unpacking the zip and if
the user wants leaner version of the library without XML libs. Or we
can just retain some pro or similar files to have just these pieces of
information that can be automatically copied there from the package
when installing.

On that end I think we could make a graphical zhfst installer that is
really simple dialog that copies the file to right place when user
clicks zhfst link on internet or double clicks a zhfst file on file
manager; I think writing that should be some 50 lines of code and it
could do this title caching along as well.

-- 
Flammie, computer scientist bachelor, linguist master, free software
Finnish localiser, and more! <http://www.iki.fi/flammie/>



More information about the Libvoikko mailing list