[libvoikko] Relation between spelling and grammar checkers

Harri Pitkänen hatapitk at iki.fi
Thu Sep 19 22:55:46 EEST 2013


A "dictionary" in libvoikko is from a high level view an object that 
associates a language tag with a subset of functionalities from the following 
list:

 * Morphological analysis
 * Spell checking
 * Spelling correction
 * Hyphenation
 * Grammar checking
 * Tokenization
 * Identification of sentence boundaries

As of now there have been only two kinds of dictionary objects:

 - Format version 2 supports all of the functionality
 - Format version 3 supports spell checking and spelling correction. If you
   try to use other functions the library should not crash but the results are
   more or less undefined.

Now it seems to me that format version 4 would support morphological analysis 
and grammar checking. This introduces a problem that needs to be addressed: 
how to handle situations where we have dictionaries of format version 3 and 4 
for the same language tag?

Ideally the application that uses libvoikko would then be able to do spell 
checking, spelling correction, morphological analysis and grammar checking for 
that language. This can be achieved in two ways:

 1) During dictionary loading both dictionaries are loaded and wired into the
    same voikko_options_t structure. This would be quite easy to implement.
    * There is one complicated corner case though: what if the language tags
      represent different variants of the same dictionary? If user requested
      "sme-x-medicine" and we have only medical spell checker but standard
      grammar checker, can the standard grammar checker be used as a
      substitute?

 2) We can also require that all grammar checkers must also provide a spell
    checker. This will simplify the logic: format 4 would always hide a
    format 3 dictionary if they have the same language tag.
    * The variant issue would still be present. We might have a spell checker
      in format 3 for some variant and only standard dictionary in format 4.

To me it does not really matter which of the options we choose. But this 
choice will affect the implementation of dictionary loading so making the 
decision is important.

Harri



More information about the Libvoikko mailing list