[libvoikko] Lttoolbox - libvoikko backend integration issues?

Harri Pitkänen hatapitk at iki.fi
Sat Apr 3 12:01:43 EEST 2010


Hi!

On Friday 02 April 2010 18:34:12 David Cheah wrote:
> Have been working to build Voikko in MinGW as per the Windows instructions
> on the Voikko page, sadly the Malaga installation isn't giving me any files
> at all - currently looking into it. Think i should be able to find out
> whats going wrong over the weekend.

I don't know if you noticed but yesterday I added a link to the Windows 
instructions page pointing to more up to date instructions written by Marko 
Wallin. They might help although nothings has changed on the Malaga side so 
the problem with it is completely new to me. If you can figure it out, let us 
know how you did it and will update the instructions.

Current version of libvoikko does not need Malaga library anymore, Malaga is 
needed only for generating the Finnish morphology. Therefore I think that you 
don't need to spend too much time trying to figure out this problem if it 
turns out to be something very complicated. You can download pre-built Malaga 
morphology from here:
  http://www.puimula.org/htp/voikko/suomimalaga/dict.zip 

> Also, I've been looking at the spelling suggestions in
> src/spellchecker/suggestions, and have a few questions.
>
> Am I right in saying that the task at hand is basically to write additional
> methods and new SuggestionGenerator(etc) files into the
> src/spellchecker/suggestions folder, which would leave the methods
> involving SuggestionStrategy and SuggestionStatus mostly unmodified?

Yes, this is exactly what needs to be done. Additionally you need to add the 
new generator to SuggestionGeneratorFactory so that it gets actually used.

> Also, what additional features/new implementations would be good to have in
> the suggestions algorithm? Or perhaps what is the current reason why there
> is a need for a better suggestions algorithm?

The current algorithms produce suggestions in a way that is specifically 
designed for Finnish language:
- No suggestions would be generated that insert or replace characters that are 
not used in Finnish (and there are many such characters at least in 
Icelandic).
- The algorithm assumes Finnish keyboard layout when it tries to replace 
mistyped letters.
- There are some generators that are probably not appropriate for other 
languages. For example SuggestionGeneratorVowelChange tries to correct errors 
where multiple vowels are wrong because Finnish vowel harmony rules have not 
been correctly followed.
- Malaga is relatively slow at analyzing words which is why the current 
generators don't even try all suggestions that are at Damerau–Levenshtein 
edit distance 1 from input string. You should check how fast Lttoolbox is 
with the available morphologies. If it is fast you can try a different 
approach at generating the suggestions. Generate everything within distance 1 
(and some more), rank the results (you would have to design a method for 
doing this) and return the top candidates.

Harri



More information about the Libvoikko mailing list