[libvoikko] HFST speller lexicon spec - RC1
Harri Pitkänen
hatapitk at iki.fi
Wed Aug 10 18:55:21 EEST 2011
On Tuesday 09 August 2011, Flammie Pirinen wrote:
> Please try your best to break it; I haven't performed even the most
> rudimentary cleanups to the code so it must at the very minimum leak
> memory and pollute cwd at the moment. The example zip files are at:
> <http://www.helsinki.fi/%7Etapirine/tmp/zhfst/>.
I tested with the Swedish example and it seems to work at least reliably. I
know it is likely too early to compare this with anything but since that is
what I like to do, here are some observations:
- Runs a lot faster (about 8 times faster) than Swedish Hunspell.
- Uses a lot more memory (vmdata size is about 50 times higher).
- On-disk size of the dictionary is about 20 times larger.
- Seems to miss more correct words than Hunspell.
- Spelling correction seems to produce lots of results for short words and
nothing for sligthly longer words.
The list of words I used in these tests does not represent real life Swedish
at all which will affect the results. I'm very interested in seeing results
from similar tests for other HFST transducers and Hunspell dictionaries.
Harri
More information about the Libvoikko
mailing list