[libvoikko] Voikko, cyrillic and case handling

Sjur Moshagen sjurnm at mac.com
Fri Jan 27 11:04:29 EET 2012


Den 25. jan. 2012 kl. 22.42 skrev Sjur Moshagen:

> Perhaps it was a bit premature to cancel it completely. I restarted LibreOffice to clear any problem issues, removed voikko, and reinstalled it. Now I *do* get suggestions in several cases, but the problematic document of mine still does not give the same set of suggestions in LibreOffice as it does when using voikkospell. Most misspellings that give perfect suggestions using voikkospell return nothing when checked in LibreOffice.

A pattern has emerged for the discrepancy between voikkospell and ooovoikko/LibreOffice:

In all cases where there is no suggestion in LibreOffice, the original input string contains the Latin character ö instead of the corresponding Cyrillic one.

Komi has two characters outside the standard Russian alphabet, of which the cyrillic ӧ is one. Because of the lack of a proper Komi keyboard on most computers, people very often turn to the "insert char" window, and pick the first ӧ-like character they find, which is often the Latin version (as used in e.g. Finnish, Swedish and German). This vowel is in addition one of the most frequent letters in Komi.

Mixing Latin and Cyrillic chars in one and the same word is thus a very frequent spelling error, and needs to be properly handled, as the command-line tool does:

W: шевкнитчöма
S: шевкнитчӧма
S: шевкнит-ума
S: шевкнитча-а
S: шевкнитчыла
S: шувкнитчӧма

In cases where there is no such mix of scripts also LibreOffice provides suggestions:

W: юньсянь
S: аньсянь
S: люньсянь
S: оньсянь
S: Ӧньсянь
S: юнасянь

Sjur




More information about the Libvoikko mailing list