[libvoikko] Another language for voikko: Avar!

Harri Pitkänen hatapitk at iki.fi
Sun Mar 9 19:51:24 EET 2014


On Sunday 09 March 2014 12:43:05 Francis Tyers wrote:
> It would be good to be able to release spellcheckers with a kind of
> spellrelax where the Latin characters do not cause spelling errors
> (really this isn't a spelling error, it's an encoding error).
> 
> Any thoughts on how to do this ? -- Most of the errors you see in that
> text are because of this problem.

Libvoikko does have some code to handle similar (language independent) 
situations where certain Unicode characters are essentially equal from the 
point of view of spell checking. These are mostly related to combining 
diacritical marks, ligatures and hyphens. We normalize the words before 
sending them to the speller backend. We also attempt (in a very limited way) 
to ensure that spelling suggestions do not contain encoding changes that are 
unrelated to fixing the actual spelling error. That is, if the word contains a 
non-breaking hyphen and there is a spelling error, the suggestions will just 
fix the spelling error without changing the non-breaking hyphen into a hyphen-
minus.

I think that supporting similar language dependent rules within libvoikko 
would be useful. But if we want ZHFST spellers to be fully self-contained then 
the information about such relaxed spelling rules or transformations would 
need to be stored in the ZHFST file. So it might be easier to handle this 
completely within hfst-ospell. And the third option is to modify the 
transducers so that the alternative characters are recognized directly.

So, for me it is OK to have this implemented within libvoikko but I will let 
others decide if that is the best solution.

Harri


More information about the Libvoikko mailing list