[libvoikko] Voikko, cyrillic and case handling

Sjur Moshagen sjurnm at mac.com
Tue Jan 24 21:22:54 EET 2012


It seems that voikko is handling upper-cased text using internal code, but only for Latin-scripted languages:

Giella - using voikko+hfst is accepted as in initial upper-cased variant of giella
Giella - using hfst-ospell is NOT accepted, because the accepting transducer does not include case handling

This is all fine, I just wanted to get a confirmation that I have understood things correctly.

Now, we are testing the voikko+hfst combo with a couple of cyrillic languages as well, and it seems that voikko is not able to handle uppercasing for those languages. Is this correct?

In which file(s) are uppercasing defined? Would it be ok to add it (and send in a patch if it seems to work ok)? Or do you prefer a different solution for handling case in non-latin (or all) casing languages/scripts?


