[libvoikko] Two identical suggestions in the suggestion list bug

Sjur Moshagen sjurnm at mac.com
Thu Sep 3 08:26:30 EEST 2015


Hello,

I have found a small bug in the interaction between libvoikko and hfst, where the end result is that the user sees two identical suggestions in the suggestion list. Not a major thing, but nice to get fixed.

What happens, is that hfst-ospell produces suggestions with different capitalisation:

$ echo Adjitt | hfst-ospell -S build/spellers/tools/spellcheckers/fstbased/hfst/se.zhfst 
"Adjitt" is NOT in the lexicon:
Corrections for "Adjitt":
Addit    14.101562
Ádjit-    15.506594
Ádjit    15.506594
ádjit    15.506594

Now, since the input had initial upper case, libvoikko changes the case of the last suggestion to follow the input. The result is two identical suggestions:

$ echo Adjitt | voikkospell -s -d se -p build/spellers/tools/spellcheckers/fstbased/hfst/
W: Adjitt
S: Addit
S: Ádjit-
S: Ádjit
S: Ádjit

It seems reasonable to suggest only initial uppercase words when the input is also with initial uppercase, but if two suggestions that are lexically different becomes identical due to uppercasing, the last one should be discarded.

Sjur



More information about the Libvoikko mailing list