[libvoikko] Follow up on BCP 47; make check

Harri Pitkänen hatapitk at iki.fi
Sun Apr 18 23:42:43 EEST 2010


SVN trunk of libvoikko now partially supports BCP 47 language tags in the new 
API. Partially means that in practice only tags in format "fi", "fi-FI", "fi-
x-something" and "fi-x-some-thing" are supported.

Note that while "fi-x-something" is not a valid tag it is still supported. I 
decided to do the following:
- voikko_dict_variant still returns the variant name exactly as given in
  Language-Variant header of voikko-fi_FI.pro. No change here from previous
  version of libvoikko.
- Old initialization API still accepts the language codes as before.
- New initialization API (voikkoInit) accepts the variant name as a private
  use subtag. You may add the extra hyphens required by BCP 47 or leave them
  out. This does not cause ambiguities since hyphen has not been an accepted
  character in variant name of version 2 dictionary format. It is recommended
  that applications create the language tag by concatenating "fi-x-" and the
  string returned by voikko_dict_variant since this will work also with
  version 3 dictionary format (which has not yet been specified).

There is a naming inconsistency here since voikko_dict_variant corresponds to 
"private use subtag" of BCP 47, not "variant subtag". I chose to ignore this 
inconsistency since I don't believe we will have much use for BCP 47 variants 
and associating our medical vocabularies with term "private use" would likely 
confuse people who have not read the standard.

I also added two new functions to the API that allow applications to find out 
which languages are supported:
- voikkoListSupportedLanguages lists the currently supported languages. This
  means that at least spell checking will work with these languages. Now that
  I think of it, I need to rename this function to
  voikkoListSupportedSpellingLanguages to avoid confusion in the future. The
  languages are listed in a way that is suitable for typical applications that
  do not care or cannot handle multiple dictionaries for one language. In
  practice the returned strings contain only the language subtag. Only for the
  few special cases where it is customary to have multiple options for one
  language shown in the user interface (for example en-US and en-GB) we may
  return codes containing language subtag AND region or script subtag.
- voikko_dict_language returns the language subtag for given voikko_dict.
  Similar functions still need to be added for region and script subtags but
  this does not necessarily need to happen for libvoikko 3.0.

Note that while the API now seems to support multiple languages it is not 
actually possible to create a dictionary that would be advertised as 
containing anything else than Finnish. It is best to leave that to libvoikko 
3.1 or later. The important thing is that it is now possible to make 
applications (openoffice.org-voikko, mozvoikko and Enchant) fully language 
independent so that an update of libvoikko in the future is enough to enable 
the actual feature.

It is also still recommended that an installation of libvoikko comes with hard 
dependency on Suomi-malaga. This is because we still support the old API which 
allowed developers to assume that support for Finnish spell checking and 
hyphenation is always present. At least openoffice.org-voikko still relies on 
this assumption and will behave in a suboptimal way if this is not the case.


While making these API changes I added some tests to "check" target of the 
autotools build system, previously there were none. Not much is tested there 
yet but hopefully the situation will improve. Our own Debian packaging files 
in SVN now run this test suite as a part of the build process. It might be a 
good idea to do this in the "real" distribution packaging scripts too since 
the tests could catch some errors that would otherwise go unnoticed (runtime 
issues on rarely tested architectures). The test suite requires no additional 
dependencies compared to a normal build. It does not have significant effect 
on build time either.

Harri



More information about the Libvoikko mailing list