[libvoikko] HFST speller lexicon spec - draft 0.2

Harri Pitkänen hatapitk at iki.fi
Sun Nov 7 10:25:21 EET 2010


On Saturday 06 November 2010, Flammie Pirinen wrote:
> For the first I am not quite sure as to
> where to find any free zip library even for the subset of features now
> specified; zlib claims support for the algorithm but not the container
> format for example[1]. Ideally there would be some small library
> available on all systems for this use as to not get any more
> dependencies for hfst-ospell.

Would http://zziplib.sourceforge.net/ work? I did not look at it closer, I 
just found it with "apt-cache search unzip" but it seems to handle the 
necessary stuff and is under LGPL.

> By the way, on voikko part of the world, can I expect that spellers can
> be tossed to $voikkodir/3/*.zhfst?  

Yes, I think so. I have not yet had time to think about this but ideally we 
should make this as simple as possible so that no extra configuration would be 
needed.

We might of course want to support using HFST based and other tools for some 
languages, like for example using an acceptor implemented with HFST and 
spelling suggestion error model or hyphenator coded directly in C. Such things 
would require an extra configuration file, something that would replace 
voikko-fi_FI.pro in current v2 configuration. And then we need to figure out 
how to set the default speller for a language when there are more than one 
available, and how this default can be changed by the user in case there is no 
application level support for selecting the variant. But I believe none of 
these requirements prevent us from handling the simplest use case the way you 
suggested.

I will publish the first release candidate libvoikko 3.1 very soon, maybe 
tomorrow. Then it would be possible to start coding the v3 configuration and 
really add support for all these things. Hopefully we can then release 
libvoikko 3.2 with non-experimental support for HFST based spellers.

Harri



More information about the Libvoikko mailing list