[libvoikko] HFST backend is no longer experimental
hatapitk at iki.fi
Mon Mar 18 17:28:19 EET 2013
On Monday 18 March 2013, Sjur Moshagen wrote:
> Using voikkospell -s I get the following:
> W: Такаси
> S: Такси
> S: Такаев
> S: Таксин
> S: Хакас
> S: Хакас
> The interesting part here is the last two suggestions - they are identical
> (I have checked with a Unicode string analyser - the chars have the exact
> same unicode values in the two suggestions, so no spurious latin char or
> some such).
> Is this a bug in voikko or in hfst-ospell?
hfst-ospell gives me the following suggestions:
Corrections for "Такаси":
So there seems to be two issues:
- Suggestions are not sorted as they should. It looks like libvoikko uses the
ospell library in a way that ignores the weights. I'll fix that.
- Libvoikko will (if it results in a valid word) try to convert the
suggestions to match the case of the original word. Ospell however returns
"хакас" and "Хакас" as separate suggestions which will then result in "Хакас"
being suggested twice. Here I'm not sure what to do. If you think we should
just trust hfst-ospell I can fix libvoikko to not touch the character case.
But then I also think that suggesting the same word with multiple different
capitalizations may not generally be a good idea.
More information about the Libvoikko