[libvoikko] Setting up HFST morphology backend

Sjur Moshagen sjurnm at mac.com
Mon Apr 26 22:54:09 EEST 2010


Den 26. apr. 2010 kl. 19.54 skrev Harri Pitkänen:

> On Monday 26 April 2010, Sjur Moshagen wrote:
>> a83-245-189-120:suomimalaga sjur$ voikkospell
>> E: Initialization of Voikko failed: No valid dictionaries were found
>> 
>> I thought step 6 above would take care of that?
> 
> This is strange, it should work. Although possibly nobody has tried to do this 
> on OS X which means there could be some unknown bugs left either in the code 
> or in the instructions. What does the following command show
> 
>  ls -l ~/.voikko/2/*

Thanks, my miss - a copy-paste error that caused the link to point to the wrong location :/

It works now:)

> However for testing HFST the instructions Flammie posted are more useful. The 
> automated test suite contains no tests for HFST. It is possible to use 
> libvoikko/malaga tests directly with HFST but many of them fail because Omorfi 
> does not support many of the features that are present in Suomi-malaga.

Ok.

One of the things I'm going to do - irrespective of which backend is used - is to add voikko support for our proofing tools test bench. For that I need some help with running voikkospell and voikkohyphenate, that is, I need an overview of the command line options for the two commands. I could not find anything online or in the documentation in svn.

What I want to do:
- run a file containing one word pr line through the speller
- get back not only a simple C or W evalutation, but also - if W - the list of suggestions, possibly with some weighting info as well

The result is compared to the input strings as well as with the expected behavior (taken from gold standard documents/hand-annotated corrected texts), and precision & recall, spelling error statistics, suggestion statistics etc. are automatically calculated. 

We have gold standard documents for three sámi languages, and it is easy to add for new languages. For two sámi languages we have working lexicons for both hunspell and our MS Office speller, and it is thus straightforward to present comparisons of the type you were looking for in your e-mail a couple of days ago.

The things I want to do are pretty straightforward, but not easy without documentation (a simple -h / --help option would have been enough) ;)

I'm sure this is possible, based on the tests already coming with the voikko sources, but I could not figure it out.

Best,
Sjur




More information about the Libvoikko mailing list