[libvoikko] Using BCP 47 language tags in libvoikko
Flammie Pirinen
flammie at iki.fi
Tue Apr 13 07:29:03 EEST 2010
2010-04-12, Harri Pitkänen sanoi:
> After having considered this again I'm starting to think that
> splitting the parameter in two parts is not necessary and would even
> limit things in the future. Instead of doing that I'm now proposing
> that we adopt IETF BCP 47 language tags for identifying the available
> vocabularies:
>
> http://tools.ietf.org/rfc/bcp/bcp47.txt
I would agree that BCP 47 is the most suitable standard for naming
languages. It has been in use in its various incarnations reasonably
long and I have yet to see real shortcomings for all its applications.
> Unfortunately length of an individual private use subtag is limited
> to eight characters which is an additional limitation to our previous
> rules for langcode. This can be worked around by adding multiple
> private use subtags which will lead to rather weird compatibility
> mappings between the old and new API:
>
> reallylongvariantname <-> fi-x-reallylo-x-ngvarian-x-tname
I might be reading the ABNF wrong-, but doesn't
privateuse = "x" 1*("-" (1*8alphanum))
mean that you could as well use fi-x-really-long-variant-name (or
reallylo-ngvarian-tname assuming automatic mapping, of course)?
> One benefit of using BCP 47 is that it incorporates RFC 4647
> (Matching of Language Tags) which would provide us with an algorithm
> for filtering available dictionaries (voikko_list_dicts) and looking
> up most appropriate vocabulary when incomplete language tag is
> specified in voikkoInit.
I haven't checked the algorithm, but I suppose that it can do
something reasonable if you have e.g. only HFST variant and medical
variant available. Of course in the end good user interface is always
required.
--
Flammie, computer scientist bachelor, linguist master, free software
Finnish localiser, and more! <http://www.iki.fi/flammie/>
More information about the Libvoikko
mailing list