[libvoikko] voikkospell segmentation fault + Sámi+hfst questions

Sjur Moshagen sjurnm at mac.com
Mon Sep 5 18:12:05 EEST 2011


Den 5. sep. 2011 kl. 17.21 skrev Harri Pitkänen:

> On Monday 05 September 2011, Sjur Moshagen wrote:
>> That is, I removed the lines for analysis and hyphenation, because I only
>> have speller transducers ATM. => Segmentation fault.
> 
> If you remove Morphology-Backend, default (Malaga) will be assumed. 
> Segmentation fault occurs because files needed by Malaga backend are not 
> present.

Ok.

>> Part of the problem here is that I can't find any documentation on the
>> content of this file. Exactly what is needed, what are the alternative
>> values for each line, etc.
> 
> Yes, unfortunately there is no documentation. Partly because I'm a bit lazy 
> when it comes to writing it but there is another reason too. voikko-fi_FI.pro 
> is a Malaga project file and any syntax related to non-Malaga, non-Finnish 
> dictionaries is experimental and it is not intended to be used in stable 
> dictionaries. Once we have a backend that is ready for production use we need 
> to either move these things to a real configuration file (and write 
> documentation) or just use the ZIP format you have already specified.

The zip file should be fine. I just need the things to work right now to test how things compare, speller-wise. That is, I want to compare the voikko output (using hfst-based SE transducer) with the Hunspell variant and with out MS Office variant, and I also want to do a similar thing with Norwegian Bokmål (but then excluding Voikko+hfst, since we have no good transducer for that language).

> In this case you will need to put the following contents to the file:
> 
> info: Voikko-Dictionary-Format: 2
> info: Language-Code: se
> info: Language-Variant: standard
> info: Description: Kokeellinen pohjoissaamen morfologia
> info: Morphology-Backend: null
> info: Speller-Backend: hfst
> info: Suggestion-Backend: null

Thanks, this is exactly what I need to know.

> You will need following three files under ~/.voikko/2/mor-se :
> 
> - voikko-fi_FI.pro with the followinf contents:
> 
> info: Voikko-Dictionary-Format: 2
> info: Language-Code: se
> info: Language-Variant: standard
> info: Description: Kokeellinen pohjoissaamen morfologia
> info: Morphology-Backend: null
> info: Speller-Backend: hfst
> info: Suggestion-Backend: null
> 
> - alphabet.hfstol and spl.hfstol which should contain the acceptor and 
> alphabet in latest HFST optimized lookup format. Or that's what I assume, the 
> actual files I use are from Tommi, I did not build them myself. I don't have 
> Sámi transducers either, I'm testing with English ones instead.
> 
> After those are in place, spelling should be testable with "voikkospell -d 
> se". In this configuration you won't get any spelling suggestions. I have not 
> tested if those would work with current code.

I made the voikko-fi_FI.pro file as specified, copied a working acceptor transducer (ie it works with hfst-ospell) to the specified filename, and just copied the aphabet.hfstol file from the omorfi variant. But the output is not as expected:

$ voikkospell -l
se-x-standard: Kokeellinen pohjoissaamen morfologia
fi-x-standard: Voikon perussanasto
fi-x-hfst: Kokeellinen HFST-morfologia
fi-x-hfstold: First Kokeellinen HFST-morfologia

$ voikkospell -s -d se
E: Initialization of Voikko failed: Failed to create speller because backend configuration could not be parsed

There can be several sources of this error:
* the alphabet.hfstol file is bad for SE (but I don't know what it is expected to be) - Tommi, can you answer that?
* also, I'm using HFST3 files now, it might be that the hfst backend code doesn't yet support that - Tommi, what is the status?
* and it migth well be that I haven't been able to compile libvoikko with proper hfst backend support

Any feedback welcome:)

> The status of using Sámi as a real language is that the necessary code is 
> already there and should work.

Ok. I'm not there yet, but hopefully I will be soon :)

> Application level support is currently only 
> available for OOo/LibreOffice, the rest (Enchant, Firefox) will probably 
> follow after we have a stable release of libvoikko supporting more than one 
> language.

Since I'm on MacOS X, I would really like to see the VoikkoSpellService plugin updated to include proper support for hfst languages as well :)

I have tried to compile, but not been able to.

> Once HFST backend (code and file formats) is stable, tested and useful for 
> production use we just need to finalize the configuration code and make a 
> release.

That sounds great!

Sjur




More information about the Libvoikko mailing list