[libvoikko] Voikko, cyrillic and case handling

Sjur Moshagen sjurnm at mac.com
Fri Jan 27 22:34:52 EET 2012


Den 27. jan. 2012 kl. 19.45 skrev Harri Pitkänen:

> So my guess is that your build of libreoffice-voikko is linked against
> outdated hfst-ospell.

Most likely it is.

> Try building it again with latest version and see if
> that helps.

I will.

> By the way, does hfst-ospell or your Komi transducer support canonically
> decomposed forms of these Unicode characters? Normally Cyrillic ö is
> written as 04E7 but it can also be decomposed as 043E 0308. Libvoikko does
> automatic conversion of decomposed forms into the more widely used
> composed form for Latin letters so that underlying morphologies don't have
> to care about this issue. This is not yet done for Cyrillic letters
> though. If you don't already support decomposed forms let me know and I
> can add the necessary mappings.

No, we don't support decomposed forms in the speller, and would actually prefer not to. So doing the conversion in the libvoikko code would be excellent.

This also means that supporting characters that are *only* available as precomposed letters is no problem, I guess. There are a couple of such characters in Kildin Sámi, another cyrillic language we are working with (which already have basic speller support in LibreOffice).

Sjur




More information about the Libvoikko mailing list