[libvoikko] Relation between spelling and grammar checkers
Harri Pitkänen
hatapitk at iki.fi
Thu Sep 19 22:55:46 EEST 2013
A "dictionary" in libvoikko is from a high level view an object that
associates a language tag with a subset of functionalities from the following
list:
* Morphological analysis
* Spell checking
* Spelling correction
* Hyphenation
* Grammar checking
* Tokenization
* Identification of sentence boundaries
As of now there have been only two kinds of dictionary objects:
- Format version 2 supports all of the functionality
- Format version 3 supports spell checking and spelling correction. If you
try to use other functions the library should not crash but the results are
more or less undefined.
Now it seems to me that format version 4 would support morphological analysis
and grammar checking. This introduces a problem that needs to be addressed:
how to handle situations where we have dictionaries of format version 3 and 4
for the same language tag?
Ideally the application that uses libvoikko would then be able to do spell
checking, spelling correction, morphological analysis and grammar checking for
that language. This can be achieved in two ways:
1) During dictionary loading both dictionaries are loaded and wired into the
same voikko_options_t structure. This would be quite easy to implement.
* There is one complicated corner case though: what if the language tags
represent different variants of the same dictionary? If user requested
"sme-x-medicine" and we have only medical spell checker but standard
grammar checker, can the standard grammar checker be used as a
substitute?
2) We can also require that all grammar checkers must also provide a spell
checker. This will simplify the logic: format 4 would always hide a
format 3 dictionary if they have the same language tag.
* The variant issue would still be present. We might have a spell checker
in format 3 for some variant and only standard dictionary in format 4.
To me it does not really matter which of the options we choose. But this
choice will affect the implementation of dictionary loading so making the
decision is important.
Harri
More information about the Libvoikko
mailing list