[libvoikko] HFST speller lexicon spec - RC1

Harri Pitkänen hatapitk at iki.fi
Wed Aug 10 18:55:21 EEST 2011

On Tuesday 09 August 2011, Flammie Pirinen wrote:
> Please try your best to break it; I haven't performed even the most
> rudimentary cleanups to the code so it must at the very minimum leak
> memory and pollute cwd at the moment. The example zip files are at:
> <http://www.helsinki.fi/%7Etapirine/tmp/zhfst/>.

I tested with the Swedish example and it seems to work at least reliably. I 
know it is likely too early to compare this with anything but since that is 
what I like to do, here are some observations:

- Runs a lot faster (about 8 times faster) than Swedish Hunspell.
- Uses a lot more memory (vmdata size is about 50 times higher).
- On-disk size of the dictionary is about 20 times larger.
- Seems to miss more correct words than Hunspell.
- Spelling correction seems to produce lots of results for short words and 
nothing for sligthly longer words.

The list of words I used in these tests does not represent real life Swedish 
at all which will affect the results. I'm very interested in seeing results 
from similar tests for other HFST transducers and Hunspell dictionaries.


