[libvoikko] Fixed limits

Harri Pitkänen hatapitk at iki.fi
Sat Dec 4 14:10:32 EET 2010


The following limits used to be defined in voikko_defines.h which is part of 
the public API of the library:

/* Fixed limits */
#define LIBVOIKKO_MAX_WORD_CHARS 255
#define LIBVOIKKO_MAX_ANALYSIS_COUNT 31

There are at least two problems with this:
- These particular limits may not be suitable for all backends and languages. 
Some may not need any limits at all and others might wish to use different 
limits.
- It is almost impossible to change these limits since they are defined in the 
header and thus the numerical values are part of the API too.

LIBVOIKKO_MAX_WORD_CHARS was inconsistently handled in our spelling functions. 
The function that takes UTF-8 strings used to return VOIKKO_SPELL_FAILED for 
overly long words whereas the function that takes wide character strings 
returned VOIKKO_INTERNAL_ERROR in the same situation.

It is time to fix these problems. I already made the following changes:
- VOIKKO_INTERNAL_ERROR is no longer used as return code for overly long 
words.
- LIBVOIKKO_MAX_ANALYSIS_COUNT is now marked as deprecated. A fixed limit with 
same value was added to MalagaAnalyzer.


In the future backends should just reject words as unknown if they cannot be 
processed for any reason. The original reason for these limits was to protect 
applications from denial of service conditions where processing some word 
could take years to complete or cause the computer to run out of memory. At 
least Malaga backend now handles these issues internally so the limits should 
not be strictly needed.

Deprecating LIBVOIKKO_MAX_WORD_CHARS does not cause any compatibility issues. 
It just means that some words that are longer than 255 characters can actually 
be handled in the future (in case there are any reasonable words that are 
longer than that). LIBVOIKKO_MAX_ANALYSIS_COUNT is a bit more problematic. If 
a developer has assumed that analysis function never returns more than 31 
results there could be problems. But I'm not aware of any program that makes 
that assumption so we can deprecate that one too.

The constants will stay (with suitable warnings) in the libvoikko headers but 
I will not include them in the Java API and I will remove them from our Python 
API. The Java and Python APIs are not totally frozen (and will not be in the 
near future) so I think it is OK to do that.

Harri



More information about the Libvoikko mailing list