[libvoikko] HFST speller lexicon spec - draft 0.2

Flammie Pirinen flammie at iki.fi
Sat Nov 6 20:50:02 EET 2010


2010-10-13, Sjur Moshagen sanoi:

> It would also be nice to know whether there are any concerns for
> adding support for this specification to hfst-ospell.

At the moment I think the spec is good to go and I only have practical
engineering issues at hand. 

For the first I am not quite sure as to
where to find any free zip library even for the subset of features now
specified; zlib claims support for the algorithm but not the container
format for example[1]. Ideally there would be some small library
available on all systems for this use as to not get any more
dependencies for hfst-ospell.

Another practical problem is, that last weeks I've been experimenting
with different morphologies, and increase in size will have quite fast
decrease in processing efficiency; morphologies beyond 100 megs of
transducer size will already take up up to minute to load up (in OO.o,
15 seconds with hfst-ospell and 30 with voikkospell on the same system).
If zipping the files will increase this time, it is a serious problem,
as end users will not easily tolerate openoffice freezing at startup
(current ooovoikko will do just that, with no clues to user of what is
happening).

All in all I think what I'd aim for is to push my current
hfst-ospell patch against voikko and current hfst-ospell to public and
then start developing the hfst-ospell lib and voikko's hfst part to
match the lexicon spec.

By the way, on voikko part of the world, can I expect that spellers can
be tossed to $voikkodir/3/*.zhfst?  

-- 
Flammie, computer scientist bachelor, linguist master, free software
Finnish localiser, and more! <http://www.iki.fi/flammie/>



More information about the Libvoikko mailing list