[libvoikko] Sámi/HFST

Harri Pitkänen hatapitk at iki.fi
Tue Jun 29 11:33:19 EEST 2010


On Tuesday 29 June 2010, Harri Pitkänen wrote:
> > Or use <http://www.helsinki.fi/~tapirine/tmp/se_FI.sug.hfst>, however
> > that cannot be easily modified.
> 
> This does not work either. It now appears that there is a bug in
> libvoikko,  I'll continue the investigation.

I'm afraid this is an HFST bug after all. If I set

  info: Morphology-Backend: null

then no suggestions are returned but voikkospell does not crash. If I set

  info: Morphology-Backend: hfst

then voikkospell crashes during spell checking. However in both cases I had

  info: Speller-Backend: hfst
  info: Suggestion-Backend: hfst

which means the morphology backend is not actually used during spell checking. 
I verified this by changing the null morphology backend so that it would print 
out something whenever it was called, and nothing was printed.

This suggests that simply initializing the HFST morphology backend breaks the 
speller backend somehow. Maybe they are using shared global data somewhere 
where they should not? Debugger shows that the crash comes from stack overflow 
within HFST speller backend. See the debugger session below.

Harri


$ gdb /home/harri/apps/bin/voikkospell
GNU gdb (GDB) 7.1-debian
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/harri/apps/bin/voikkospell...done.
(gdb) set args -d fi-x-sme -s
(gdb) run
Starting program: /home/harri/apps/bin/voikkospell -d fi-x-sme -s
[Thread debugging using libthread_db enabled]
kissa

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6b97468 in HWFST::find_all_continuations (n=25, 
input_position=Cannot access memory at address 0x7fffff7fefe8
) at ofst/hwfst-lookup.C:165
165     ofst/hwfst-lookup.C: Tiedostoa tai hakemistoa ei ole.
        in ofst/hwfst-lookup.C
(gdb) bt 20
#0  0x00007ffff6b97468 in HWFST::find_all_continuations (n=25, 
input_position=Cannot access memory at address 0x7fffff7fefe8
) at ofst/hwfst-lookup.C:165
#1  0x00007ffff6b97813 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:201
#2  0x00007ffff6b978d2 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:183
#3  0x00007ffff6b978d2 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:183
#4  0x00007ffff6b97813 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:201
#5  0x00007ffff6b97813 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:201
#6  0x00007ffff6b97813 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:201
#7  0x00007ffff6b97813 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:201
#8  0x00007ffff6b97813 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:201
#9  0x00007ffff6b97813 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:201
#10 0x00007ffff6b978d2 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:183
#11 0x00007ffff6b978d2 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:183
#12 0x00007ffff6b978d2 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:183
#13 0x00007ffff6b978d2 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:183
#14 0x00007ffff6b978d2 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:183
#15 0x00007ffff6b978d2 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:183
#16 0x00007ffff6b978d2 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:183
#17 0x00007ffff6b97813 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:201
#18 0x00007ffff6b97813 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:201
#19 0x00007ffff6b97813 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:201
(More stack frames follow...)


(gdb) bt -20
#47634 0x00007ffff6b97569 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:288
#47635 0x00007ffff6b9763d in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:270
#47636 0x00007ffff6b97569 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:288
#47637 0x00007ffff6b9763d in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:270
#47638 0x00007ffff6b97569 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:288
#47639 0x00007ffff6b9763d in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:270
#47640 0x00007ffff6b97569 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:288
#47641 0x00007ffff6b97569 in HWFST::find_all_continuations (n=<value optimized 
out>, input_position=..., input_end_position=..., t=<value optimized out>, 
skip_symbols=0x60e120, 
    preserve_epsilons=<value optimized out>) at ofst/hwfst-lookup.C:288
#47642 0x00007ffff6b97b0b in HWFST::find_all_output_strings (t=..., 
input=0xb76fc0, skip_symbols=0x60e120) at ofst/hwfst-lookup.C:345
#47643 0x00007ffff6ac4f28 in HWFST::lookup_all (t=0x19, 
input_string=0x3ba3a6a, skip_symbols=0x3ba3a6a) at ofst/hofst.C:1878
#47644 0x00007ffff7bb5ba5 in libvoikko::spellchecker::HfstSpeller::doSpell 
(this=0x60e0a0, word=<value optimized out>, wlen=<value optimized out>) at 
spellchecker/HfstSpeller.cpp:53
#47645 0x00007ffff7bb5d0d in libvoikko::spellchecker::HfstSpeller::spell 
(this=0x60e0a0, word=0x2193bf0 L"kissa", wlen=62536298) at 
spellchecker/HfstSpeller.cpp:77
#47646 0x00007ffff7b9b6f0 in libvoikko::voikko_do_spell 
(voikkoOptions=0x60b040, word=0x3ba3a6a L"ι\x200000", len=62536298) at 
spellchecker/spell.cpp:42
#47647 0x00007ffff7b9ba30 in hyphenAwareSpell (voikkoOptions=0x19, 
word=0x3ba3a6a L"ι\x200000", len=62536298) at spellchecker/spell.cpp:142
#47648 0x00007ffff7b9bb8e in voikko_cached_spell (voikkoOptions=0x60b040, 
buffer=0x2193bf0 L"kissa", len=5) at spellchecker/spell.cpp:184
#47649 0x00007ffff7b9bda2 in voikkoSpellUcs4 (voikkoOptions=0x60b040, 
word=<value optimized out>) at spellchecker/spell.cpp:275
#47650 0x0000000000402efe in check_word (handle=0x19, word=0x3ba3a6a 
L"ι\x200000", out=...) at voikkospell.cpp:78
#47651 0x0000000000403626 in handleWordSingleThread (word=<value optimized 
out>) at voikkospell.cpp:224
#47652 0x00000000004049b5 in handleWord (argc=4, argv=0x7fffffffe7c8) at 
voikkospell.cpp:232
#47653 main (argc=4, argv=0x7fffffffe7c8) at voikkospell.cpp:429
(gdb) 



More information about the Libvoikko mailing list