[libvoikko] HFST speller lexicon spec - draft 0.2
hatapitk at iki.fi
Sun Nov 7 20:30:07 EET 2010
On Saturday 06 November 2010, Flammie Pirinen wrote:
> Another practical problem is, that last weeks I've been experimenting
> with different morphologies, and increase in size will have quite fast
> decrease in processing efficiency; morphologies beyond 100 megs of
> transducer size will already take up up to minute to load up (in OO.o,
> 15 seconds with hfst-ospell and 30 with voikkospell on the same system).
> If zipping the files will increase this time, it is a serious problem,
> as end users will not easily tolerate openoffice freezing at startup
> (current ooovoikko will do just that, with no clues to user of what is
It's interesting that loading takes so long in OOo if hfst-ospell only needs
15 seconds. It might be that something gets loaded and unloaded more often
than necessary. This could very well have gone unnoticed since loading happens
more or less instantly when Malaga is used (we only map the lexicon to memory
at load time but don't actually read it).
I will need to debug this. Is there a morphology somewhere that can be built
with current HFST trunk out of the box? I managed to build the HFST trunk (and
lost the tools from HFST2 in the process) and it appears that at least Omorfi
does not build with the new tools yet. Different command line options hfst-
regexp2fst are causing problems.
Anyways long load times are a problem. It would definitely be good to fix that
if at all possible, even if the problem would occur only with very few
languages. Displaying status information is something that we currently cannot
do from ooovoikko since these proofreading components can be initialized in
quite varying contexts. We can't know in advance if there is any UI at all and
if there is, what possibilities there are for displaying feedback. If we
really need to tell the user that "spell checker is loading, please wait" it
should be done from the OOo framework code. That certainly should be possible
but hopefully we don't need to go that far.
More information about the Libvoikko