[libvoikko] Strange bug in the interface between hfst-ospell and libvoikko

Børre Gaup borre.gaup at uit.no
Thu Dec 19 14:32:57 EET 2013


On vuos, 2013-12-16 at 23:02 +0000, Gaup Børre wrote:
> On Monday 16. of December 2013 22.05.56 Sjur Moshagen wrote:
> > 16. des. 2013 kl. 20:03 skrev Harri Pitkänen <hatapitk at iki.fi>:
> > > I think the problem is not there but in the implementation of
> > > extract_to_mem. That is where the infinite loop occurs on Windows. The
> > > current implementation has multiple problems (uses size_t instead of
> > > signed ssize_t and thus cannot handle error codes) and seems to have
> > > problems with buffer positions if it loops more than once. I failed to
> > > fix it though. I think something like this
> > > should work:
> > [...]
> > 
> > > But it does not. It will error out with ARCHIVE_FAILED on Windows. Don't
> > > know what it would do on Linux.
> > 
> > At least on MacOSX 10.9 it seems to work equally good/bas as the old code:
> > 
> > * seems to work fine with hfst-ospell on its own
> > * crashes voikkospell with the same error:
> > 
> > $ voikkospell -l -p tools/spellcheckers/fstbased/hfst/
> > libc++abi.dylib: terminating with uncaught exception of type
> > hfst_ol::ZHfstXmlParsingError Abort trap: 6
> > 
> 
> Output from MacOSX 10.8
> 
> voikkospell + hfst-ospell with libxmlpp backend
> 
> sma $ voikkospell -l -p tools/spellcheckers/fstbased/hfst -d sma
> sma-x-standard: Giellatekno/Divvun/UiT fst-based speller for Southern Sami
> sma $ voikkospell -L -p tools/spellcheckers/fstbased/hfst -d sma
> spell:sma
> sma $ voikkospell -p tools/spellcheckers/fstbased/hfst -d sma
> Entity: line 29: parser error : Extra content at the end of the document
> tor type="general" id="acceptor.default.hfst">
> ^
> libc++abi.dylib: terminate called throwing an exception
> Abort trap: 6
> sma $ voikkospell -s -p tools/spellcheckers/fstbased/hfst -d sma
> Entity: line 29: parser error : Extra content at the end of the document
> tor type="general" id="acceptor.default.hfst">
> ^
> libc++abi.dylib: terminate called throwing an exception
> Abort trap: 6
> 
> voikkospell + hfst-ospell with tinyxml2 backend
> 
> sma $ voikkospell -L -p tools/spellcheckers/fstbased/hfst -d sma
> spell:sma
> sma $ voikkospell -l -p tools/spellcheckers/fstbased/hfst -d sma
> sma-x-standard: Giellatekno/Divvun/UiT fst-based speller for Southern Sami
> sma $ voikkospell -s -p tools/spellcheckers/fstbased/hfst -d sma
> libc++abi.dylib: terminate called throwing an exception
> Abort trap: 6
> 
> 

I have replaced libarchive with libzip in hfst-ospell and tested it and
voikkospell on Linux (Kubuntu 13.10), Mac OS X 10.6 and 10.8, using both
the libxmlpp and tinyxml2 backends in hfst-ospell. The result is
available on https://github.com/albbas/hfstospell.git

To be able to compile libvoikko on OS X, I had to do this: 
sudo cp /opt/local/lib/libzip/include/zipconf.h /opt/local/include/
because I don't know how to set up libzip properly in automake.

I have used sma and smj from giellatekno as my test languages.

Both hfst-ospell and voikkospell works on Linux and 10.8

With the libxmlpp backend on 10.6 voikkospell crashes with this message.


sma $ voikkospell -s -p tools/spellcheckers/fstbased/hfst -d sma
Entity: line 29: parser error : Extra content at the end of the document
�N�����:��;�v�^�6�v�)��K�^^F��a�';�EO�v�)��J/��n��N��V��>��z;W�n��6;��<
^
terminate called after throwing an instance of 'xmlpp::parse_error'
  what():  Document not well-formed.
Line 29, column 1 (fatal):
Extra content at the end of the document

Abort trap


If I change the order of the files in the .zhfst then this setup works
on 10.6, too.

* Order that leads to crash *
smj $ unzip -l tools/spellcheckers/fstbased/hfst/3/smj.zhfst
Archive:  tools/spellcheckers/fstbased/hfst/3/smj.zhfst
  Length     Date   Time    Name
 --------    ----   ----    ----
 21066916  12-19-13 12:49   acceptor.default.hfst
 63777854  12-19-13 12:51   errmodel.default.hfst
     1218  12-14-13 10:01   index.xml
 --------                   -------
 84845988                   3 files

* Order that makes voikkospell on 10.6 work, too *
smj $ unzip -l tools/spellcheckers/fstbased/hfst/3/smj.zhfst 
Archive:  tools/spellcheckers/fstbased/hfst/3/smj.zhfst
  Length     Date   Time    Name
 --------    ----   ----    ----
     1218  12-14-13 10:01   index.xml
 21066916  12-19-13 12:49   acceptor.default.hfst
 63777854  12-19-13 12:51   errmodel.default.hfst
 --------                   -------
 84845988                   3 files





More information about the Libvoikko mailing list