[libvoikko] Using BCP 47 language tags in libvoikko

Flammie Pirinen flammie at iki.fi
Tue Apr 13 07:29:03 EEST 2010

2010-04-12, Harri Pitkänen sanoi:

> After having considered this again I'm starting to think that
> splitting the parameter in two parts is not necessary and would even
> limit things in the future. Instead of doing that I'm now proposing
> that we adopt IETF BCP 47 language tags for identifying the available
> vocabularies:
>   http://tools.ietf.org/rfc/bcp/bcp47.txt

I would agree that BCP 47 is the most suitable standard for naming
languages. It has been in use in its various incarnations reasonably
long and I have yet to see real shortcomings for all its applications.

> Unfortunately length of an individual private use subtag is limited
> to eight characters which is an additional limitation to our previous
> rules for langcode. This can be worked around by adding multiple
> private use subtags which will lead to rather weird compatibility
> mappings between the old and new API:
>   reallylongvariantname <-> fi-x-reallylo-x-ngvarian-x-tname

I might be reading the ABNF wrong-, but doesn't 

  privateuse    = "x" 1*("-" (1*8alphanum))

mean that you could as well use fi-x-really-long-variant-name (or
reallylo-ngvarian-tname assuming automatic mapping, of course)? 

> One benefit of using BCP 47 is that it incorporates RFC 4647
> (Matching of  Language Tags) which would provide us with an algorithm
> for filtering available dictionaries (voikko_list_dicts) and looking
> up most appropriate vocabulary when incomplete language tag is
> specified in voikkoInit.

I haven't checked the algorithm, but I suppose that it can do
something reasonable if you have e.g. only HFST variant and medical
variant available. Of course in the end good user interface is always

Flammie, computer scientist bachelor, linguist master, free software
Finnish localiser, and more! <http://www.iki.fi/flammie/>

More information about the Libvoikko mailing list