[libvoikko] Lttoolbox - libvoikko backend integration issues?
hatapitk at iki.fi
Sat Apr 3 12:01:43 EEST 2010
On Friday 02 April 2010 18:34:12 David Cheah wrote:
> Have been working to build Voikko in MinGW as per the Windows instructions
> on the Voikko page, sadly the Malaga installation isn't giving me any files
> at all - currently looking into it. Think i should be able to find out
> whats going wrong over the weekend.
I don't know if you noticed but yesterday I added a link to the Windows
instructions page pointing to more up to date instructions written by Marko
Wallin. They might help although nothings has changed on the Malaga side so
the problem with it is completely new to me. If you can figure it out, let us
know how you did it and will update the instructions.
Current version of libvoikko does not need Malaga library anymore, Malaga is
needed only for generating the Finnish morphology. Therefore I think that you
don't need to spend too much time trying to figure out this problem if it
turns out to be something very complicated. You can download pre-built Malaga
morphology from here:
> Also, I've been looking at the spelling suggestions in
> src/spellchecker/suggestions, and have a few questions.
> Am I right in saying that the task at hand is basically to write additional
> methods and new SuggestionGenerator(etc) files into the
> src/spellchecker/suggestions folder, which would leave the methods
> involving SuggestionStrategy and SuggestionStatus mostly unmodified?
Yes, this is exactly what needs to be done. Additionally you need to add the
new generator to SuggestionGeneratorFactory so that it gets actually used.
> Also, what additional features/new implementations would be good to have in
> the suggestions algorithm? Or perhaps what is the current reason why there
> is a need for a better suggestions algorithm?
The current algorithms produce suggestions in a way that is specifically
designed for Finnish language:
- No suggestions would be generated that insert or replace characters that are
not used in Finnish (and there are many such characters at least in
- The algorithm assumes Finnish keyboard layout when it tries to replace
- There are some generators that are probably not appropriate for other
languages. For example SuggestionGeneratorVowelChange tries to correct errors
where multiple vowels are wrong because Finnish vowel harmony rules have not
been correctly followed.
- Malaga is relatively slow at analyzing words which is why the current
generators don't even try all suggestions that are at Damerau–Levenshtein
edit distance 1 from input string. You should check how fast Lttoolbox is
with the available morphologies. If it is fast you can try a different
approach at generating the suggestions. Generate everything within distance 1
(and some more), rank the results (you would have to design a method for
doing this) and return the top candidates.
More information about the Libvoikko