[libvoikko] HFST speller lexicon spec - draft 0.2
Harri Pitkänen
hatapitk at iki.fi
Sun Nov 7 10:25:21 EET 2010
On Saturday 06 November 2010, Flammie Pirinen wrote:
> For the first I am not quite sure as to
> where to find any free zip library even for the subset of features now
> specified; zlib claims support for the algorithm but not the container
> format for example[1]. Ideally there would be some small library
> available on all systems for this use as to not get any more
> dependencies for hfst-ospell.
Would http://zziplib.sourceforge.net/ work? I did not look at it closer, I
just found it with "apt-cache search unzip" but it seems to handle the
necessary stuff and is under LGPL.
> By the way, on voikko part of the world, can I expect that spellers can
> be tossed to $voikkodir/3/*.zhfst?
Yes, I think so. I have not yet had time to think about this but ideally we
should make this as simple as possible so that no extra configuration would be
needed.
We might of course want to support using HFST based and other tools for some
languages, like for example using an acceptor implemented with HFST and
spelling suggestion error model or hyphenator coded directly in C. Such things
would require an extra configuration file, something that would replace
voikko-fi_FI.pro in current v2 configuration. And then we need to figure out
how to set the default speller for a language when there are more than one
available, and how this default can be changed by the user in case there is no
application level support for selecting the variant. But I believe none of
these requirements prevent us from handling the simplest use case the way you
suggested.
I will publish the first release candidate libvoikko 3.1 very soon, maybe
tomorrow. Then it would be possible to start coding the v3 configuration and
really add support for all these things. Hopefully we can then release
libvoikko 3.2 with non-experimental support for HFST based spellers.
Harri
More information about the Libvoikko
mailing list