[libvoikko] voikkospell segmentation fault + Sámi+hfst questions
Sjur Moshagen
sjurnm at mac.com
Mon Sep 5 18:12:05 EEST 2011
Den 5. sep. 2011 kl. 17.21 skrev Harri Pitkänen:
> On Monday 05 September 2011, Sjur Moshagen wrote:
>> That is, I removed the lines for analysis and hyphenation, because I only
>> have speller transducers ATM. => Segmentation fault.
>
> If you remove Morphology-Backend, default (Malaga) will be assumed.
> Segmentation fault occurs because files needed by Malaga backend are not
> present.
Ok.
>> Part of the problem here is that I can't find any documentation on the
>> content of this file. Exactly what is needed, what are the alternative
>> values for each line, etc.
>
> Yes, unfortunately there is no documentation. Partly because I'm a bit lazy
> when it comes to writing it but there is another reason too. voikko-fi_FI.pro
> is a Malaga project file and any syntax related to non-Malaga, non-Finnish
> dictionaries is experimental and it is not intended to be used in stable
> dictionaries. Once we have a backend that is ready for production use we need
> to either move these things to a real configuration file (and write
> documentation) or just use the ZIP format you have already specified.
The zip file should be fine. I just need the things to work right now to test how things compare, speller-wise. That is, I want to compare the voikko output (using hfst-based SE transducer) with the Hunspell variant and with out MS Office variant, and I also want to do a similar thing with Norwegian Bokmål (but then excluding Voikko+hfst, since we have no good transducer for that language).
> In this case you will need to put the following contents to the file:
>
> info: Voikko-Dictionary-Format: 2
> info: Language-Code: se
> info: Language-Variant: standard
> info: Description: Kokeellinen pohjoissaamen morfologia
> info: Morphology-Backend: null
> info: Speller-Backend: hfst
> info: Suggestion-Backend: null
Thanks, this is exactly what I need to know.
> You will need following three files under ~/.voikko/2/mor-se :
>
> - voikko-fi_FI.pro with the followinf contents:
>
> info: Voikko-Dictionary-Format: 2
> info: Language-Code: se
> info: Language-Variant: standard
> info: Description: Kokeellinen pohjoissaamen morfologia
> info: Morphology-Backend: null
> info: Speller-Backend: hfst
> info: Suggestion-Backend: null
>
> - alphabet.hfstol and spl.hfstol which should contain the acceptor and
> alphabet in latest HFST optimized lookup format. Or that's what I assume, the
> actual files I use are from Tommi, I did not build them myself. I don't have
> Sámi transducers either, I'm testing with English ones instead.
>
> After those are in place, spelling should be testable with "voikkospell -d
> se". In this configuration you won't get any spelling suggestions. I have not
> tested if those would work with current code.
I made the voikko-fi_FI.pro file as specified, copied a working acceptor transducer (ie it works with hfst-ospell) to the specified filename, and just copied the aphabet.hfstol file from the omorfi variant. But the output is not as expected:
$ voikkospell -l
se-x-standard: Kokeellinen pohjoissaamen morfologia
fi-x-standard: Voikon perussanasto
fi-x-hfst: Kokeellinen HFST-morfologia
fi-x-hfstold: First Kokeellinen HFST-morfologia
$ voikkospell -s -d se
E: Initialization of Voikko failed: Failed to create speller because backend configuration could not be parsed
There can be several sources of this error:
* the alphabet.hfstol file is bad for SE (but I don't know what it is expected to be) - Tommi, can you answer that?
* also, I'm using HFST3 files now, it might be that the hfst backend code doesn't yet support that - Tommi, what is the status?
* and it migth well be that I haven't been able to compile libvoikko with proper hfst backend support
Any feedback welcome:)
> The status of using Sámi as a real language is that the necessary code is
> already there and should work.
Ok. I'm not there yet, but hopefully I will be soon :)
> Application level support is currently only
> available for OOo/LibreOffice, the rest (Enchant, Firefox) will probably
> follow after we have a stable release of libvoikko supporting more than one
> language.
Since I'm on MacOS X, I would really like to see the VoikkoSpellService plugin updated to include proper support for hfst languages as well :)
I have tried to compile, but not been able to.
> Once HFST backend (code and file formats) is stable, tested and useful for
> production use we just need to finalize the configuration code and make a
> release.
That sounds great!
Sjur
More information about the Libvoikko
mailing list