[libvoikko] HFST speller lexicon spec - draft 0.2

Flammie Pirinen flammie at iki.fi
Thu Nov 11 16:52:39 EET 2010


2010-11-07, Harri Pitkänen sanoi:

> It's interesting that loading takes so long in OOo if hfst-ospell
> only needs 15 seconds. It might be that something gets loaded and
> unloaded more often than necessary. This could very well have gone
> unnoticed since loading happens more or less instantly when Malaga is
> used (we only map the lexicon to memory at load time but don't
> actually read it).

I've certainly noticed that OOo unloads and loads the dictionary during
the use occasionally, since these delays are really noticeable. During
my testing with Finnish HFST stuff I didn't see it either, since it is
mostly unnoticeable in current version I've used (I attached
voikko-hfst-ospell.patch for reference, I'll commit it after
hfst-ospell library is released?)

On slightly related story, if you want to test this specific thing I've
mentioned already, I suppose it's ok to demonstrate it already; it's
the greenlandic in divvun's svn
<https://victorio.uit.no/langtech/trunk/st/kal> with the other patch I
attached. Requires HFST 2, the optimized stuff and foma.

> I will need to debug this. Is there a morphology somewhere that can
> be built with current HFST trunk out of the box? I managed to build
> the HFST trunk (and lost the tools from HFST2 in the process) and it
> appears that at least Omorfi does not build with the new tools yet.
> Different command line options hfst- regexp2fst are causing problems.

Yeah, most of the morphologies will not be doable with HFST3 before
twolc is ported or xfst-compiler gains wide enough support, I'd
estimate towards end of November. I've personally been using
--program-suffix=3 during testing of HFST tools, it's one of these
nifty features why it's worth using autotools.

> Anyways long load times are a problem. It would definitely be good to
> fix that if at all possible, even if the problem would occur only
> with very few languages. Displaying status information is something
> that we currently cannot do from ooovoikko since these proofreading
> components can be initialized in quite varying contexts. We can't
> know in advance if there is any UI at all and if there is, what
> possibilities there are for displaying feedback. If we really need to
> tell the user that "spell checker is loading, please wait" it should
> be done from the OOo framework code. That certainly should be
> possible but hopefully we don't need to go that far.

Ah, I see. That's of course understandable, the way it starts up
usually is that the normal OOo progress bar loads just nicely and takes
some long time and after you write some few character it really
freezes. I think if this freezing was within progress bar loading time
it would be no problem, but anyways, the ideal solution is to fix the
root cause of the slowdowns in our code.

-- 
Flammie, computer scientist bachelor, linguist master, free software
Finnish localiser, and more! <http://www.iki.fi/flammie/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kl-gl-hfst-speller.patch
Type: text/x-patch
Size: 24003 bytes
Desc: not available
URL: <http://lists.puimula.org/pipermail/libvoikko/attachments/20101111/74829b70/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: voikko-hfst-ospell.patch
Type: text/x-patch
Size: 22400 bytes
Desc: not available
URL: <http://lists.puimula.org/pipermail/libvoikko/attachments/20101111/74829b70/attachment-0001.bin>


More information about the Libvoikko mailing list