[libvoikko] Skolt Sami upper-case bug in libvoikko?

Harri Pitkänen hatapitk at iki.fi
Thu Mar 12 17:49:59 EET 2015


On Wednesday 11 March 2015 17:47:40 Trosterud Trond wrote:
> If I understand you correctly the upper limit is 0x01EF.

Not everything below 0x01EF is converted but I believe most of the holes in 
there are for punctuation or control characters.

> The characters 0x0218-0x021B are in use for Romanian. This is not exactly
> our focus, but we have a (bad) analyser (and speller) for it.
 
> 0x021E-0x021F are for Finnish Romani. Not on our plate.

Added 0x01F8 - 0x021F (continuous range of upper-lower pairs).

> There is a small/capital Skolt Saami pair where one is in Latin B and the
> other one outside of it:
 
> U+0292 ʒ (small) = U+01B7 Ʒ

Added this as a special case.

> Hmm, then there is cyrillic, but since that works, it means that you must
> have U+0400 - U+04FF already (we certainly do not have all pairs, but we do
> have much more than the Russian ones).

Yes, most of those are supported already. Some are missing but don't look like 
letters to me.

Harri


More information about the Libvoikko mailing list