[libvoikko] Sámi/HFST
Harri Pitkänen
hatapitk at iki.fi
Mon Jun 7 13:33:06 EEST 2010
On Monday 07 June 2010, Kevin Brubeck Unhammer wrote:
> (All those names were recognised when I run them through hfst-lookup,
> strange.)
This is because HfstSpeller in libvoikko does not support the optimization
that would allow determining with single backend call spell("matti") whether
SPELL_OK: "matti" and "Matti" are both correct
SPELL_CAP_FIRST: "Matti" is correct but "matti" is not
SPELL_FAILED: neither "matti" nor "Matti" are correct.
Now HfstSpeller returns SPELL_FAILED when it should return SPELL_CAP_FIRST.
We could fix HfstSpeller to handle this particular case but it would not help
with more complex capitalization scenarios. Spell checker should be able to
check words written COMPLETELY IN UPPER CASE. In that case "MATTI" is correct
if any of "matti", "Matti", "MATTI", "mAtti", ... is correct. Checking all
those combinations separately is not possible within reasonable time but right
now HFST or Sámi transducer does not appear to support any other way of doing
case insensitive checking.
I suppose we should at least add some support for backends (or languages) that
do not support case insensitive checking. This would mean disabling the
optimization for capitalized first letter with such backends and probably just
stating that case insensitive (all-caps) checking may not work correctly when
these are used.
Harri
More information about the Libvoikko
mailing list