[libvoikko] Follow up on BCP 47; make check
Sjur Moshagen
sjurnm at mac.com
Mon Apr 19 13:08:40 EEST 2010
Den 18. apr. 2010 kl. 23.42 skrev Harri Pitkänen:
> SVN trunk of libvoikko now partially supports BCP 47 language tags in the new
> API. Partially means that in practice only tags in format "fi", "fi-FI", "fi-
> x-something" and "fi-x-some-thing" are supported.
Nice. Do you plan to support also the script subtag in the future?
> There is a naming inconsistency here since voikko_dict_variant corresponds to
> "private use subtag" of BCP 47, not "variant subtag". I chose to ignore this
> inconsistency since I don't believe we will have much use for BCP 47 variants
> and associating our medical vocabularies with term "private use" would likely
> confuse people who have not read the standard.
If the marriage with HFST turns out to be successful (which I hope and believe), it could mean that this need will arise sooner rather than later.
> I also added two new functions to the API that allow applications to find out
> which languages are supported:
> - voikkoListSupportedLanguages lists the currently supported languages. This
> means that at least spell checking will work with these languages. Now that
> I think of it, I need to rename this function to
> voikkoListSupportedSpellingLanguages to avoid confusion in the future. The
> languages are listed in a way that is suitable for typical applications that
> do not care or cannot handle multiple dictionaries for one language. In
> practice the returned strings contain only the language subtag. Only for the
> few special cases where it is customary to have multiple options for one
> language shown in the user interface (for example en-US and en-GB) we may
> return codes containing language subtag AND region or script subtag.
This might be problematic on platforms where it is customary to always (or usually) provide the whole language + regian + script tag. The present Sámi tools are presented as five distinct language+region combinations in MS Office for Windows (not on the Mac, though):
se-FI
se-NO
se-SE
smj-NO
smj-SE
(Actually, MS Office do not use ISO language and country/region codes, but the interpretation is the same.)
These are the five choices the users have, although the actuall spellers are only two: North Sámi (se) and Lule Sámi (smj). To me it looks like your decision might make it problematic to support the Sámi spellers in MS Office on Windows. Or am I wrong?
Best regards,
Sjur
More information about the Libvoikko
mailing list