[libvoikko] Fixed limits
hatapitk at iki.fi
Sat Dec 4 14:10:32 EET 2010
The following limits used to be defined in voikko_defines.h which is part of
the public API of the library:
/* Fixed limits */
#define LIBVOIKKO_MAX_WORD_CHARS 255
#define LIBVOIKKO_MAX_ANALYSIS_COUNT 31
There are at least two problems with this:
- These particular limits may not be suitable for all backends and languages.
Some may not need any limits at all and others might wish to use different
- It is almost impossible to change these limits since they are defined in the
header and thus the numerical values are part of the API too.
LIBVOIKKO_MAX_WORD_CHARS was inconsistently handled in our spelling functions.
The function that takes UTF-8 strings used to return VOIKKO_SPELL_FAILED for
overly long words whereas the function that takes wide character strings
returned VOIKKO_INTERNAL_ERROR in the same situation.
It is time to fix these problems. I already made the following changes:
- VOIKKO_INTERNAL_ERROR is no longer used as return code for overly long
- LIBVOIKKO_MAX_ANALYSIS_COUNT is now marked as deprecated. A fixed limit with
same value was added to MalagaAnalyzer.
In the future backends should just reject words as unknown if they cannot be
processed for any reason. The original reason for these limits was to protect
applications from denial of service conditions where processing some word
could take years to complete or cause the computer to run out of memory. At
least Malaga backend now handles these issues internally so the limits should
not be strictly needed.
Deprecating LIBVOIKKO_MAX_WORD_CHARS does not cause any compatibility issues.
It just means that some words that are longer than 255 characters can actually
be handled in the future (in case there are any reasonable words that are
longer than that). LIBVOIKKO_MAX_ANALYSIS_COUNT is a bit more problematic. If
a developer has assumed that analysis function never returns more than 31
results there could be problems. But I'm not aware of any program that makes
that assumption so we can deprecate that one too.
The constants will stay (with suitable warnings) in the libvoikko headers but
I will not include them in the Java API and I will remove them from our Python
API. The Java and Python APIs are not totally frozen (and will not be in the
near future) so I think it is OK to do that.
More information about the Libvoikko