[libvoikko] Follow up on BCP 47; make check

Harri Pitkänen hatapitk at iki.fi
Mon Apr 19 17:50:01 EEST 2010

On Monday 19 April 2010, Sjur Moshagen wrote:
> Den 18. apr. 2010 kl. 23.42 skrev Harri Pitkänen:
> > SVN trunk of libvoikko now partially supports BCP 47 language tags in the
> > new API. Partially means that in practice only tags in format "fi",
> > "fi-FI", "fi- x-something" and "fi-x-some-thing" are supported.
> Nice. Do you plan to support also the script subtag in the future?

Yes, once they are needed.

> > There is a naming inconsistency here since voikko_dict_variant
> > corresponds to "private use subtag" of BCP 47, not "variant subtag". I
> > chose to ignore this inconsistency since I don't believe we will have
> > much use for BCP 47 variants and associating our medical vocabularies
> > with term "private use" would likely confuse people who have not read the
> > standard.
> If the marriage with HFST turns out to be successful (which I hope and
>  believe), it could mean that this need will arise sooner rather than
>  later.

Are there any details available on which language might need BCP 47 variants 
in the near future? While the API can be extended easily in any release we 
need to take this into account when changing the dictionary format. This might 
happen by the end of 2010.

> This might be problematic on platforms where it is customary to always (or
>  usually) provide the whole language + regian + script tag. The present
>  Sámi tools are presented as five distinct language+region combinations in
>  MS Office for Windows (not on the Mac, though):
> se-FI
> se-NO
> se-SE
> smj-NO
> smj-SE
> (Actually, MS Office do not use ISO language and country/region codes, but
>  the interpretation is the same.)
> These are the five choices the users have, although the actuall spellers
>  are only two: North Sámi (se) and Lule Sámi (smj). To me it looks like
>  your decision might make it problematic to support the Sámi spellers in MS
>  Office on Windows. Or am I wrong?

It will always be possible to call voikko_list_dicts to get full information 
about all available dictionaries. So the problem would not be anything that 
could not be solved with an extra for loop in the code outside libvoikko.

The problem is that user might have installed dictionaries such as "sv-SE" and 
"sv-FI" and some applications might not know how to present these two to the 
user. The application might show two identical entries for "Swedish" or just 
pick one at random. voikkoListSupportedLanguages was designed to help such 
simple applications by listing just the language and relying on local 
configuration for picking the correct regional variant.


More information about the Libvoikko mailing list