[libvoikko] Follow up on BCP 47; make check

Sjur Moshagen sjurnm at mac.com
Mon Apr 19 13:08:40 EEST 2010


Den 18. apr. 2010 kl. 23.42 skrev Harri Pitkänen:

> SVN trunk of libvoikko now partially supports BCP 47 language tags in the new 
> API. Partially means that in practice only tags in format "fi", "fi-FI", "fi-
> x-something" and "fi-x-some-thing" are supported.

Nice. Do you plan to support also the script subtag in the future?

> There is a naming inconsistency here since voikko_dict_variant corresponds to 
> "private use subtag" of BCP 47, not "variant subtag". I chose to ignore this 
> inconsistency since I don't believe we will have much use for BCP 47 variants 
> and associating our medical vocabularies with term "private use" would likely 
> confuse people who have not read the standard.

If the marriage with HFST turns out to be successful (which I hope and believe), it could mean that this need will arise sooner rather than later.

> I also added two new functions to the API that allow applications to find out 
> which languages are supported:
> - voikkoListSupportedLanguages lists the currently supported languages. This
>  means that at least spell checking will work with these languages. Now that
>  I think of it, I need to rename this function to
>  voikkoListSupportedSpellingLanguages to avoid confusion in the future. The
>  languages are listed in a way that is suitable for typical applications that
>  do not care or cannot handle multiple dictionaries for one language. In
>  practice the returned strings contain only the language subtag. Only for the
>  few special cases where it is customary to have multiple options for one
>  language shown in the user interface (for example en-US and en-GB) we may
>  return codes containing language subtag AND region or script subtag.

This might be problematic on platforms where it is customary to always (or usually) provide the whole language + regian + script tag. The present Sámi tools are presented as five distinct language+region combinations in MS Office for Windows (not on the Mac, though):

se-FI
se-NO
se-SE
smj-NO
smj-SE

(Actually, MS Office do not use ISO language and country/region codes, but the interpretation is the same.)

These are the five choices the users have, although the actuall spellers are only two: North Sámi (se) and Lule Sámi (smj). To me it looks like your decision might make it problematic to support the Sámi spellers in MS Office on Windows. Or am I wrong?

Best regards,
Sjur




More information about the Libvoikko mailing list