[libvoikko] Sámi/HFST
Harri Pitkänen
hatapitk at iki.fi
Mon Jun 7 11:14:57 EEST 2010
On Monday 07 June 2010, Harri Pitkänen wrote:
> That solved the problem and I was able to do
> some basic spell checking:
I did a bit more testing by comparing Voikko (binary sme transducer from
hfst.sf.net) and Hunspell (hunspell-se 1.0~beta6.20081222-1.1 from Debian).
From performance point of view HFST/Voikko seems to be much better than
Hunspell:
- Checking all unique words from Sámi Wikipedia took 9.5 seconds with Voikko
and 20.5 seconds with Hunspell. These numbers contain the time needed to
initialize the speller and perform the actual checking.
- Use of non-shareable memory after loading the speller was 26 Mb with Voikko
and 150 Mb with Hunspell. Both programs used about 25 Mb of shareable memory
on top of those numbers.
It should be noted that the configuration used with Voikko does not support
spelling suggestions at all. Depending on how those would be implemented
memory footprint for Voikko can end up being much larger than in this test.
Starting Hunspell with Sámi language caused lots of errors like this:
error: line 3512: flag id 65529 is too large (max: 65509)
I cannot say which one is linguistically better. Both can be used in OOo so I
made a screenshot that contains some Northern Sámi text checked with both
spellers:
http://www.puimula.org/htp/testing/hfst/openoffice-sme-spelling.png
Harri
More information about the Libvoikko
mailing list