[libvoikko] Two identical suggestions in the suggestion list bug
Sjur Moshagen
sjurnm at mac.com
Thu Sep 3 08:26:30 EEST 2015
Hello,
I have found a small bug in the interaction between libvoikko and hfst, where the end result is that the user sees two identical suggestions in the suggestion list. Not a major thing, but nice to get fixed.
What happens, is that hfst-ospell produces suggestions with different capitalisation:
$ echo Adjitt | hfst-ospell -S build/spellers/tools/spellcheckers/fstbased/hfst/se.zhfst
"Adjitt" is NOT in the lexicon:
Corrections for "Adjitt":
Addit 14.101562
Ádjit- 15.506594
Ádjit 15.506594
ádjit 15.506594
Now, since the input had initial upper case, libvoikko changes the case of the last suggestion to follow the input. The result is two identical suggestions:
$ echo Adjitt | voikkospell -s -d se -p build/spellers/tools/spellcheckers/fstbased/hfst/
W: Adjitt
S: Addit
S: Ádjit-
S: Ádjit
S: Ádjit
It seems reasonable to suggest only initial uppercase words when the input is also with initial uppercase, but if two suggestions that are lexically different becomes identical due to uppercasing, the last one should be discarded.
Sjur
More information about the Libvoikko
mailing list