[libvoikko] Strange bug in the interface between hfst-ospell and libvoikko
Børre Gaup
borre.gaup at uit.no
Thu Dec 19 14:32:57 EET 2013
On vuos, 2013-12-16 at 23:02 +0000, Gaup Børre wrote:
> On Monday 16. of December 2013 22.05.56 Sjur Moshagen wrote:
> > 16. des. 2013 kl. 20:03 skrev Harri Pitkänen <hatapitk at iki.fi>:
> > > I think the problem is not there but in the implementation of
> > > extract_to_mem. That is where the infinite loop occurs on Windows. The
> > > current implementation has multiple problems (uses size_t instead of
> > > signed ssize_t and thus cannot handle error codes) and seems to have
> > > problems with buffer positions if it loops more than once. I failed to
> > > fix it though. I think something like this
> > > should work:
> > [...]
> >
> > > But it does not. It will error out with ARCHIVE_FAILED on Windows. Don't
> > > know what it would do on Linux.
> >
> > At least on MacOSX 10.9 it seems to work equally good/bas as the old code:
> >
> > * seems to work fine with hfst-ospell on its own
> > * crashes voikkospell with the same error:
> >
> > $ voikkospell -l -p tools/spellcheckers/fstbased/hfst/
> > libc++abi.dylib: terminating with uncaught exception of type
> > hfst_ol::ZHfstXmlParsingError Abort trap: 6
> >
>
> Output from MacOSX 10.8
>
> voikkospell + hfst-ospell with libxmlpp backend
>
> sma $ voikkospell -l -p tools/spellcheckers/fstbased/hfst -d sma
> sma-x-standard: Giellatekno/Divvun/UiT fst-based speller for Southern Sami
> sma $ voikkospell -L -p tools/spellcheckers/fstbased/hfst -d sma
> spell:sma
> sma $ voikkospell -p tools/spellcheckers/fstbased/hfst -d sma
> Entity: line 29: parser error : Extra content at the end of the document
> tor type="general" id="acceptor.default.hfst">
> ^
> libc++abi.dylib: terminate called throwing an exception
> Abort trap: 6
> sma $ voikkospell -s -p tools/spellcheckers/fstbased/hfst -d sma
> Entity: line 29: parser error : Extra content at the end of the document
> tor type="general" id="acceptor.default.hfst">
> ^
> libc++abi.dylib: terminate called throwing an exception
> Abort trap: 6
>
> voikkospell + hfst-ospell with tinyxml2 backend
>
> sma $ voikkospell -L -p tools/spellcheckers/fstbased/hfst -d sma
> spell:sma
> sma $ voikkospell -l -p tools/spellcheckers/fstbased/hfst -d sma
> sma-x-standard: Giellatekno/Divvun/UiT fst-based speller for Southern Sami
> sma $ voikkospell -s -p tools/spellcheckers/fstbased/hfst -d sma
> libc++abi.dylib: terminate called throwing an exception
> Abort trap: 6
>
>
I have replaced libarchive with libzip in hfst-ospell and tested it and
voikkospell on Linux (Kubuntu 13.10), Mac OS X 10.6 and 10.8, using both
the libxmlpp and tinyxml2 backends in hfst-ospell. The result is
available on https://github.com/albbas/hfstospell.git
To be able to compile libvoikko on OS X, I had to do this:
sudo cp /opt/local/lib/libzip/include/zipconf.h /opt/local/include/
because I don't know how to set up libzip properly in automake.
I have used sma and smj from giellatekno as my test languages.
Both hfst-ospell and voikkospell works on Linux and 10.8
With the libxmlpp backend on 10.6 voikkospell crashes with this message.
sma $ voikkospell -s -p tools/spellcheckers/fstbased/hfst -d sma
Entity: line 29: parser error : Extra content at the end of the document
�N�����:��;�v�^�6�v�)��K�^^F��a�';�EO�v�)��J/��n��N��V��>��z;W�n��6;��<
^
terminate called after throwing an instance of 'xmlpp::parse_error'
what(): Document not well-formed.
Line 29, column 1 (fatal):
Extra content at the end of the document
Abort trap
If I change the order of the files in the .zhfst then this setup works
on 10.6, too.
* Order that leads to crash *
smj $ unzip -l tools/spellcheckers/fstbased/hfst/3/smj.zhfst
Archive: tools/spellcheckers/fstbased/hfst/3/smj.zhfst
Length Date Time Name
-------- ---- ---- ----
21066916 12-19-13 12:49 acceptor.default.hfst
63777854 12-19-13 12:51 errmodel.default.hfst
1218 12-14-13 10:01 index.xml
-------- -------
84845988 3 files
* Order that makes voikkospell on 10.6 work, too *
smj $ unzip -l tools/spellcheckers/fstbased/hfst/3/smj.zhfst
Archive: tools/spellcheckers/fstbased/hfst/3/smj.zhfst
Length Date Time Name
-------- ---- ---- ----
1218 12-14-13 10:01 index.xml
21066916 12-19-13 12:49 acceptor.default.hfst
63777854 12-19-13 12:51 errmodel.default.hfst
-------- -------
84845988 3 files
More information about the Libvoikko
mailing list