[libvoikko] HFST speller lexicon spec - draft 0.2

Harri Pitkänen hatapitk at iki.fi
Sun Nov 7 20:30:07 EET 2010

On Saturday 06 November 2010, Flammie Pirinen wrote:
> Another practical problem is, that last weeks I've been experimenting
> with different morphologies, and increase in size will have quite fast
> decrease in processing efficiency; morphologies beyond 100 megs of
> transducer size will already take up up to minute to load up (in OO.o,
> 15 seconds with hfst-ospell and 30 with voikkospell on the same system).
> If zipping the files will increase this time, it is a serious problem,
> as end users will not easily tolerate openoffice freezing at startup
> (current ooovoikko will do just that, with no clues to user of what is
> happening).

It's interesting that loading takes so long in OOo if hfst-ospell only needs 
15 seconds. It might be that something gets loaded and unloaded more often 
than necessary. This could very well have gone unnoticed since loading happens 
more or less instantly when Malaga is used (we only map the lexicon to memory 
at load time but don't actually read it).

I will need to debug this. Is there a morphology somewhere that can be built 
with current HFST trunk out of the box? I managed to build the HFST trunk (and 
lost the tools from HFST2 in the process) and it appears that at least Omorfi 
does not build with the new tools yet. Different command line options hfst-
regexp2fst are causing problems.

Anyways long load times are a problem. It would definitely be good to fix that 
if at all possible, even if the problem would occur only with very few 
languages. Displaying status information is something that we currently cannot 
do from ooovoikko since these proofreading components can be initialized in 
quite varying contexts. We can't know in advance if there is any UI at all and 
if there is, what possibilities there are for displaying feedback. If we 
really need to tell the user that "spell checker is loading, please wait" it 
should be done from the OOo framework code. That certainly should be possible 
but hopefully we don't need to go that far.


More information about the Libvoikko mailing list