[libvoikko] Sámi/HFST

Flammie Pirinen flammie at iki.fi
Tue Jun 29 17:40:20 EEST 2010


2010-06-29, Harri Pitkänen sanoi:

> This suggests that simply initializing the HFST morphology backend
> breaks the speller backend somehow. Maybe they are using shared
> global data somewhere where they should not? Debugger shows that the
> crash comes from stack overflow within HFST speller backend. See the
> debugger session below.

I would initially guess that flag diacritic implementation is the main
suspect here, since it was patched over everything in quite fast pace
and never really cleaned up.

> Program received signal SIGSEGV, Segmentation fault.
> 0x00007ffff6b97468 in HWFST::find_all_continuations (n=25, 
> input_position=Cannot access memory at address 0x7fffff7fefe8
> ) at ofst/hwfst-lookup.C:165
> 165     ofst/hwfst-lookup.C: Tiedostoa tai hakemistoa ei ole.
>         in ofst/hwfst-lookup.C
> (gdb) bt 20
> #0  0x00007ffff6b97468 in HWFST::find_all_continuations (n=25, 
> input_position=Cannot access memory at address 0x7fffff7fefe8
> ) at ofst/hwfst-lookup.C:165
> #1  0x00007ffff6b97813 in HWFST::find_all_continuations (n=<value
> optimized 
> out>, input_position=..., input_end_position=..., t=<value optimized
> out>out>, 
> skip_symbols=0x60e120, 
>     preserve_epsilons=<value optimized out>) at
> ofst/hwfst-lookup.C:201 #2  0x00007ffff6b978d2 in
> HWFST::find_all_continuations (n=<value optimized 
> out>, input_position=..., input_end_position=..., t=<value optimized
> out>out>, 
> skip_symbols=0x60e120, 
>     preserve_epsilons=<value optimized out>) at
> ofst/hwfst-lookup.C:183 #3  0x00007ffff6b978d2 in
> HWFST::find_all_continuations (n=<value optimized 
> out>, input_position=..., input_end_position=..., t=<value optimized
> out>out>, 
> skip_symbols=0x60e120, 
>     preserve_epsilons=<value optimized out>) at
> ofst/hwfst-lookup.C:183

It would appear here, that lookup gets infinite results or gets stuck
somehow. I have to admit that when I patched together the HFST support
in voikko I ignored the sanity checks on morphology, because I assumed
that no working morphology should produce infinite results on lookup.
I'll try to patch it tonight.


> #47643
> 0x00007ffff6ac4f28 in HWFST::lookup_all (t=0x19,
> input_string=0x3ba3a6a, skip_symbols=0x3ba3a6a) at ofst/hofst.C:1878

This is also very suspicious: input_string and skip_symbols have the
same address. Either optimizations have confused gdb somehow, or
something is very broken. If it somehow manages to convert vector of
longs into set of longs though, it is the reason for this breakage
since it would end up considering every character in the input as
epsilon.



-- 
Flammie, computer scientist bachelor, linguist master, free software
Finnish localiser, and more! <http://www.iki.fi/flammie/>



More information about the Libvoikko mailing list