[libvoikko] grammar checker checks

Sjur Moshagen sjurnm at mac.com
Thu Dec 12 23:20:55 EET 2013


To follow up on an old thread, but in a somewhat different direction:

28. sep. 2013 kl. 09:37 skrev Harri Pitkänen <hatapitk at iki.fi>:

> ... One of the most wanted improvements to Voikko is "integrate it to
> LibreOffice core". We are now closer than ever of being able to do that
> but there is still much work to be done.

This would be really great, but I have two questions/comments:

1) supported lexicons

What would the supported lexicons of Voikko-in-LO-core be? Would zhfst be one of the formats?

2) open vs closed list of supported languages

Hunspell has a couple of features that makes it very easy to adopt, one being its plain text-only format. But another, equally important detail is that the list of supported languages is undefined, ie completely open: the language is in the filename. On systems and in applications that support Hunspell speller dictionaries, one can use whatever language code one wants. If the language code is not recognised, the code itself will usually be displayed instead. This works great, and means that there is no work required by outside parties to add support for a new language. Examples of systems and applications that work like this are InDesign CS 5.5+, MacOSX, Linux in many flavors, etc.

But not LibreOffice, and by deliberate extension, not Voikko. This is unfortunate for a number of reasons:

* each time somebody wants support for a new language, a number of people will have to take action
** add to LO in several places
** add to LO-Voikko
* the language community will have to *ask* for a service from outsiders, creating an unbalance between different language communities that isn’t sound (see e.g. the first comment in this bug: https://bugs.freedesktop.org/show_bug.cgi?id=70217)
* this creates an impression of a closed tool/environment, as opposed to Hunspell, which is open to everyone (edit your file(s), name it/them correctly, and off you go)
* this closed-ness puts people off, they don’t understand what is going on - we have already seen that a couple of times during the zhfst beta period: people install a zhfst file, and expect it to work. It doesn’t, because LO-Voikko doesn’t recognise the language, which is so because LO doesn’t recognise the language. But to the end user this logic is nonsense: the file is in the proper location, in the proper format, and with a proper filename - it *should* work (just like Hunspell files, or any other files they regularly meet).

The solution:

The proper solution is of course to do it like Hunspell (or close to it): Let the language/locale be deducted from the filename (and verified by the index.xml content) - and that’s it. No hard-coded list of known languages: as long as the filename/metadata locale id follows certain standards for locale identification, that should be it. If the language/locale is unknown to the system, just present the language/locale code as is. Even better (than Hunspell): since the zhfst file format has a metadata store, it is easy to provide a human-readable version of the language name to present in menus and the like, even in the cases where the language code is unknown to the host system/application.

I understand that LO is a big beast, and that it will take time to implement something like this. But I see no reason for LO-Voikko to continue along the present path. At least there will be one less hurdle to get any new language supported. And I hope that the present work on getting Voikko supported in the LO core could also move LO in that direction.

Sjur



More information about the Libvoikko mailing list