[libvoikko] Another language for voikko: Avar!
Harri Pitkänen
hatapitk at iki.fi
Sun Mar 9 19:51:24 EET 2014
On Sunday 09 March 2014 12:43:05 Francis Tyers wrote:
> It would be good to be able to release spellcheckers with a kind of
> spellrelax where the Latin characters do not cause spelling errors
> (really this isn't a spelling error, it's an encoding error).
>
> Any thoughts on how to do this ? -- Most of the errors you see in that
> text are because of this problem.
Libvoikko does have some code to handle similar (language independent)
situations where certain Unicode characters are essentially equal from the
point of view of spell checking. These are mostly related to combining
diacritical marks, ligatures and hyphens. We normalize the words before
sending them to the speller backend. We also attempt (in a very limited way)
to ensure that spelling suggestions do not contain encoding changes that are
unrelated to fixing the actual spelling error. That is, if the word contains a
non-breaking hyphen and there is a spelling error, the suggestions will just
fix the spelling error without changing the non-breaking hyphen into a hyphen-
minus.
I think that supporting similar language dependent rules within libvoikko
would be useful. But if we want ZHFST spellers to be fully self-contained then
the information about such relaxed spelling rules or transformations would
need to be stored in the ZHFST file. So it might be easier to handle this
completely within hfst-ospell. And the third option is to modify the
transducers so that the alternative characters are recognized directly.
So, for me it is OK to have this implemented within libvoikko but I will let
others decide if that is the best solution.
Harri
More information about the Libvoikko
mailing list