[libvoikko] VoikkoSpellService + hfst questions

Sjur Moshagen sjurnm at mac.com
Thu Sep 8 18:15:36 EEST 2011


Den 8. sep. 2011 kl. 17.45 skrev Harri Pitkänen:

> On Thursday 08 September 2011, Sjur Moshagen wrote:
>> In ALL these cases the system preferences (Language & text > Text > Speller
>> langauges > Setup) did display one and only one language - Suomi - which I
>> assume is the Malaga-based ones. This is true also in the case where I
>> deleted ALL languages. It STILL listed Suomi (Voikko) (in parallel to
>> Suomi (Soikko)) as the one and only Voikko language, even thoug there
>> should be no such language to my knowledge.
> 
> This is because VoikkoSpellService does not query libvoikko for available 
> languages. It just assumes that Finnish (but nothing else) is available.
> 
> As I wrote on Monday, among the publicly available frontends so far only 
> OOo/LibreOffice take advantage of support for multiple languages in libvoikko.

Sorry, I didn't get that part of your e-mail.

> For anything else you still need to resort to the old hack: modify voikko-
> fi_FI.pro and change
> 
>  info: Language-Code: se
> 
> to
> 
>  info: Language-Code: fi_FI

Ok.

> We need to update the application support but for me it is more important to 
> first get at least one non-Finnish dictionary to releasable state, integrate 
> it into our automated tests and promote the necessary backend code from 
> experimental to production state.

To me it is the other way around:)

I would like to see the application support for multiple languages implemented ASAP, so that we can start using the language models as proofing tools. That is in practice the only way I can get the linguists in my team to start improving the HFST dictionary quality. There are two issues that need to be addressed for the Sámi languages: 1) the processing from a raw lexical transducer to a final speller transducer is not working as it should, so that the present speller transducer is quite some distant from what we produce for MS Office users; and 2) the suggestion/correction mechanism is very different from what we have been using so far, and will require substantial work to be even comparable to what we have for MS Office users. There might be other issues as well related to speed and memory consumption (esp. SME is way too large).

That is, I would prefer a usable (but possibly unofficial) release with the application support most relevant to us (that is OOo and VoikkoSpellService) so that we easily can eat our own dog food. That is the way to move forward for us:)

Given that I'm pretty sure we could provide three Sámi languages with decent quality within a reasonable timeframe:)

> In case others (Marko for example) wish to start working on the application 
> code: What is needed in VoikkoSpellService and mozvoikko is a lazily 
> initialized pool of handles for each supported language and some calls to 
> voikko_list_dicts, voikko_dict_language and voikko_free_dicts to query for the 
> available dictionaries. SVN /trunk/ooovoikko/src/VoikkoHandlePool.cxx is an 
> example of a pool implementation. For VoikkoSpellService and mozvoikko 
> something slightly simpler will do since I believe they don't offer any on-
> the-fly configuration capabilities for spell checking. In other words, there 
> is some but not too much work needed to support this.
> 
> Updating Enchant should be easier since it comes with its own speller pool 
> implementation.
> 
> Harri

Sjur




More information about the Libvoikko mailing list