[libvoikko] Sámi/HFST

Harri Pitkänen hatapitk at iki.fi
Mon Jun 7 11:14:57 EEST 2010


On Monday 07 June 2010, Harri Pitkänen wrote:
> That solved the problem and I was able to do 
> some basic spell checking:

I did a bit more testing by comparing Voikko (binary sme transducer from 
hfst.sf.net) and Hunspell (hunspell-se 1.0~beta6.20081222-1.1 from Debian). 
From performance point of view HFST/Voikko seems to be much better than 
Hunspell:

- Checking all unique words from Sámi Wikipedia took 9.5 seconds with Voikko 
and 20.5 seconds with Hunspell. These numbers contain the time needed to 
initialize the speller and perform the actual checking.

- Use of non-shareable memory after loading the speller was 26 Mb with Voikko 
and 150 Mb with Hunspell. Both programs used about 25 Mb of shareable memory 
on top of those numbers.

It should be noted that the configuration used with Voikko does not support 
spelling suggestions at all. Depending on how those would be implemented 
memory footprint for Voikko can end up being much larger than in this test.

Starting Hunspell with Sámi language caused lots of errors like this:

  error: line 3512: flag id 65529 is too large (max: 65509)

I cannot say which one is linguistically better. Both can be used in OOo so I 
made a screenshot that contains some Northern Sámi text checked with both 
spellers:

  http://www.puimula.org/htp/testing/hfst/openoffice-sme-spelling.png

Harri



More information about the Libvoikko mailing list