[libvoikko] HFST backend is no longer experimental

Harri Pitkänen hatapitk at iki.fi
Mon Mar 18 17:28:19 EET 2013


On Monday 18 March 2013, Sjur Moshagen wrote:
> Using voikkospell -s I get the following:
> 
> Такаси
> W: Такаси
> S: Такси
> S: Такаев
> S: Таксин
> S: Хакас
> S: Хакас
> 
> The interesting part here is the last two suggestions - they are identical
> (I have checked with a Unicode string analyser - the chars have the exact
> same unicode values in the two suggestions, so no spurious latin char or
> some such).
> 
> Is this a bug in voikko or in hfst-ospell?

hfst-ospell gives me the following suggestions:

Corrections for "Такаси":
Такси    1
Накас    2
Таҥаса    2
хакас    2
такси    2
накас    2
Хакас    2
Таҥас    2
Таҥасе    2
Таксин    2
Таксе    2
Таксим    2

So there seems to be two issues:

- Suggestions are not sorted as they should. It looks like libvoikko uses the 
ospell library in a way that ignores the weights. I'll fix that.

- Libvoikko will (if it results in a valid word) try to convert the 
suggestions to match the case of the original word. Ospell however returns 
"хакас" and "Хакас" as separate suggestions which will then result in "Хакас" 
being suggested twice. Here I'm not sure what to do. If you think we should 
just trust hfst-ospell I can fix libvoikko to not touch the character case. 
But then I also think that suggesting the same word with multiple different 
capitalizations may not generally be a good idea.

Harri


More information about the Libvoikko mailing list