[libvoikko] Strange bug in the interface between hfst-ospell and libvoikko

Harri Pitkänen hatapitk at iki.fi
Mon Dec 16 20:03:25 EET 2013


On Monday 16 December 2013 17:24:46 Sjur Moshagen wrote:
> Patch applied in hfst svn at r3653.
> 
> Please test, and see if this helps. Also note that there is one corner case
> that is not handled: when the extra noise contains the character ‘>’.
> 
> Suggestions for alternative/improved ways of handling this, or to correct
> the behavior of libarchive would be very welcome.

I think the problem is not there but in the implementation of extract_to_mem. 
That is where the infinite loop occurs on Windows. The current implementation 
has multiple problems (uses size_t instead of signed ssize_t and thus cannot 
handle error codes) and seems to have problems with buffer positions if it 
loops more than once. I failed to fix it though. I think something like this 
should work:

static
void*
extract_to_mem(archive* ar, archive_entry* entry, size_t* n)
  {
    size_t full_length = 0;
    const struct stat* st = archive_entry_stat(entry);
    size_t buffsize = st->st_size;
    char * buff = (char*)malloc(sizeof(char) * buffsize);
    for (;;)
      {
        ssize_t curr = archive_read_data(ar, buff + full_length, buffsize - 
full_length);
        if (0 == curr)
          {
            break;
          }
        else if (ARCHIVE_RETRY == curr)
          {
            continue;
          }
        else if (ARCHIVE_FAILED == curr)
          {
            throw ZHfstZipReadingError("Archive broken (ARCHIVE_FAILED)");
          }
        else if (curr < 0)
          {
            throw ZHfstZipReadingError("Archive broken...");
          } 
        else
          {
            full_length += curr;
          }
      }
    *n = full_length;
    return buff;
  }


But it does not. It will error out with ARCHIVE_FAILED on Windows. Don't know 
what it would do on Linux.

Harri


More information about the Libvoikko mailing list