[libvoikko] HFST speller lexicon spec - draft 0.2

Flammie Pirinen flammie at iki.fi
Sun Nov 7 01:16:43 EET 2010


[Forgot the link, and added few notes to sound less pessimistic]

2010-11-06, Flammie Pirinen sanoi:

> 2010-10-13, Sjur Moshagen sanoi:
> 
> > It would also be nice to know whether there are any concerns for
> > adding support for this specification to hfst-ospell.
>
> For the first I am not quite sure as to
> where to find any free zip library even for the subset of features now
> specified; zlib claims support for the algorithm but not the container
> format for example[1]. Ideally there would be some small library
> available on all systems for this use as to not get any more
> dependencies for hfst-ospell.

[1] http://www.zlib.net/zlib_faq.html#faq11 talks about command line
program whose source code may be reused for the purpose?

> [I]ncrease in size will have quite fast
> decrease in processing efficiency; morphologies beyond 100 megs of
> transducer size will already take up up to minute to load up 

This relates to the most morphologically complex languages of
course, e.g. one polysynthetic I am currently working on. Typical tests
at converting hunspell dictionaries to automata I've done have created
automata of sizes ranging from few kilobytes to 10 megabytes, and same
for most floss fst morphologies that can be found by searching the net.

Also it's most likely entirely possible to optimize the automata using
some automatic (tbd) or manual (flag diacritics, etc.) methods, but
these of course require further programming work as mentioned.

-- 
Flammie, computer scientist bachelor, linguist master, free software
Finnish localiser, and more! <http://www.iki.fi/flammie/>



More information about the Libvoikko mailing list