[libvoikko] [Apertium-stuff] Lttoolbox (Apertium) morphology backend

Harri Pitkänen hatapitk at iki.fi
Mon Mar 1 17:36:12 EET 2010


On Sunday 28 February 2010 22:07:12 Francis Tyers wrote:
> Nah, that doesn't do it :/
>
> The new method should probably just be a copy of the old one, only that
> checks to see if all the input has been consumed.

I came up with an ugly workaround. For all input strings we compare the output 
from biltransWithoutQueue with the output for the same input with last 
character removed. If removing a character does not change the result, we can 
be quite confident that the original input was invalid.

Of course this makes the analysis twice as slow as it used to be and the 
method is not 100% accurate at least theoretically. But it seems to work. A 
new method in Lttoolbox is still needed if we want to do this properly.

I also fixed a bug in core libvoikko speller code that, due to the way our 
Malaga backend is implemented, never showed up in Finnish spell checking but 
caused lots of incorrectly rejected words when using Lttoolbox. Now it seems 
to me that the remaining issues with Icelandic spell checking are either due 
to words not being in the lexicon or wrong tokenization in OOo. If you find 
anything that still needs fixing, let me know.

Harri



More information about the Libvoikko mailing list