[libvoikko] voikkospell segmentation fault + Sámi+hfst questions
Harri Pitkänen
hatapitk at iki.fi
Mon Sep 5 17:21:53 EEST 2011
On Monday 05 September 2011, Sjur Moshagen wrote:
> That is, I removed the lines for analysis and hyphenation, because I only
> have speller transducers ATM. => Segmentation fault.
If you remove Morphology-Backend, default (Malaga) will be assumed.
Segmentation fault occurs because files needed by Malaga backend are not
present.
> Part of the problem here is that I can't find any documentation on the
> content of this file. Exactly what is needed, what are the alternative
> values for each line, etc.
Yes, unfortunately there is no documentation. Partly because I'm a bit lazy
when it comes to writing it but there is another reason too. voikko-fi_FI.pro
is a Malaga project file and any syntax related to non-Malaga, non-Finnish
dictionaries is experimental and it is not intended to be used in stable
dictionaries. Once we have a backend that is ready for production use we need
to either move these things to a real configuration file (and write
documentation) or just use the ZIP format you have already specified.
In this case you will need to put the following contents to the file:
info: Voikko-Dictionary-Format: 2
info: Language-Code: se
info: Language-Variant: standard
info: Description: Kokeellinen pohjoissaamen morfologia
info: Morphology-Backend: null
info: Speller-Backend: hfst
info: Suggestion-Backend: null
> In June last year there was a long discussion on getting Sámi working as a
> real language (and not a variant of FI) using the HFST backend. What
> exactly is the status on that now? What exactly is required for this setup
> to work? What transducers are needed, and how should the files be named?
You will need following three files under ~/.voikko/2/mor-se :
- voikko-fi_FI.pro with the followinf contents:
info: Voikko-Dictionary-Format: 2
info: Language-Code: se
info: Language-Variant: standard
info: Description: Kokeellinen pohjoissaamen morfologia
info: Morphology-Backend: null
info: Speller-Backend: hfst
info: Suggestion-Backend: null
- alphabet.hfstol and spl.hfstol which should contain the acceptor and
alphabet in latest HFST optimized lookup format. Or that's what I assume, the
actual files I use are from Tommi, I did not build them myself. I don't have
Sámi transducers either, I'm testing with English ones instead.
After those are in place, spelling should be testable with "voikkospell -d
se". In this configuration you won't get any spelling suggestions. I have not
tested if those would work with current code.
The status of using Sámi as a real language is that the necessary code is
already there and should work. Application level support is currently only
available for OOo/LibreOffice, the rest (Enchant, Firefox) will probably
follow after we have a stable release of libvoikko supporting more than one
language.
Once HFST backend (code and file formats) is stable, tested and useful for
production use we just need to finalize the configuration code and make a
release.
Harri
More information about the Libvoikko
mailing list