[libvoikko] voikkospell segmentation fault + Sámi+hfst questions

Harri Pitkänen hatapitk at iki.fi
Mon Sep 5 17:21:53 EEST 2011


On Monday 05 September 2011, Sjur Moshagen wrote:
> That is, I removed the lines for analysis and hyphenation, because I only
> have speller transducers ATM. => Segmentation fault.

If you remove Morphology-Backend, default (Malaga) will be assumed. 
Segmentation fault occurs because files needed by Malaga backend are not 
present.

> Part of the problem here is that I can't find any documentation on the
> content of this file. Exactly what is needed, what are the alternative
> values for each line, etc.

Yes, unfortunately there is no documentation. Partly because I'm a bit lazy 
when it comes to writing it but there is another reason too. voikko-fi_FI.pro 
is a Malaga project file and any syntax related to non-Malaga, non-Finnish 
dictionaries is experimental and it is not intended to be used in stable 
dictionaries. Once we have a backend that is ready for production use we need 
to either move these things to a real configuration file (and write 
documentation) or just use the ZIP format you have already specified.

In this case you will need to put the following contents to the file:

info: Voikko-Dictionary-Format: 2
info: Language-Code: se
info: Language-Variant: standard
info: Description: Kokeellinen pohjoissaamen morfologia
info: Morphology-Backend: null
info: Speller-Backend: hfst
info: Suggestion-Backend: null

> In June last year there was a long discussion on getting Sámi working as a
> real language (and not a variant of FI) using the HFST backend. What
> exactly is the status on that now? What exactly is required for this setup
> to work? What transducers are needed, and how should the files be named?

You will need following three files under ~/.voikko/2/mor-se :

- voikko-fi_FI.pro with the followinf contents:

info: Voikko-Dictionary-Format: 2
info: Language-Code: se
info: Language-Variant: standard
info: Description: Kokeellinen pohjoissaamen morfologia
info: Morphology-Backend: null
info: Speller-Backend: hfst
info: Suggestion-Backend: null

- alphabet.hfstol and spl.hfstol which should contain the acceptor and 
alphabet in latest HFST optimized lookup format. Or that's what I assume, the 
actual files I use are from Tommi, I did not build them myself. I don't have 
Sámi transducers either, I'm testing with English ones instead.

After those are in place, spelling should be testable with "voikkospell -d 
se". In this configuration you won't get any spelling suggestions. I have not 
tested if those would work with current code.

The status of using Sámi as a real language is that the necessary code is 
already there and should work. Application level support is currently only 
available for OOo/LibreOffice, the rest (Enchant, Firefox) will probably 
follow after we have a stable release of libvoikko supporting more than one 
language.

Once HFST backend (code and file formats) is stable, tested and useful for 
production use we just need to finalize the configuration code and make a 
release.

Harri



More information about the Libvoikko mailing list