[libvoikko] VoikkoSpellService + hfst questions

Sjur Moshagen sjurnm at mac.com
Thu Sep 8 20:55:43 EEST 2011


Den 8. sep. 2011 kl. 18.15 skrev Sjur Moshagen:

>> For anything else you still need to resort to the old hack: modify voikko-
>> fi_FI.pro and change
>> 
>> info: Language-Code: se
>> 
>> to
>> 
>> info: Language-Code: fi_FI
> 
> Ok.

I got it to work - somehow. After switching languages from fi to sma by this manual means, the system did recognise the Voikko speller service (as Finnish). But AFAICT it reports every single word as misspelled, which is not what the command-line tool voikkospell does. Thus it seems to be more work involved than just the things that Harri already mentioned.

If there is any interest in debugging this, I can provide both a precompiled universal binary libvoikko dylib for linking, and the speller files I used to test with.

>> We need to update the application support but for me it is more important to 
>> first get at least one non-Finnish dictionary to releasable state, integrate 
>> it into our automated tests and promote the necessary backend code from 
>> experimental to production state.
> 
> To me it is the other way around:)
> 
> I would like to see the application support for multiple languages implemented ASAP, so that we can start using the language models as proofing tools. That is in practice the only way I can get the linguists in my team to start improving the HFST dictionary quality. There are two issues that need to be addressed for the Sámi languages: 1) the processing from a raw lexical transducer to a final speller transducer is not working as it should, so that the present speller transducer is quite some distant from what we produce for MS Office users; and 2) the suggestion/correction mechanism is very different from what we have been using so far, and will require substantial work to be even comparable to what we have for MS Office users. There might be other issues as well related to speed and memory consumption (esp. SME is way too large).

Sorry for sounding a bit negative to your approach. Tommi is hard at work finishing the Greenlandic voikko+hfst speller, which should give you the required second language. We will as well work with the Sámi languages, but there is a lot of other work to do that does not directly relate to the work with hfst transducers, thus I need to find a way to motive it. Real, working applications would be a great motivator, but certainly not the only one.

I'll try to get the OOo thing working with the latest libvoikko.

Sjur




More information about the Libvoikko mailing list