[libvoikko] Voikko and upcoming changes to Firefox extensions

Henri Sivonen hsivonen at hsivonen.fi
Wed Apr 5 13:07:04 EEST 2017


On Mon, Apr 3, 2017 at 3:39 PM, Tino Didriksen <mail at tinodidriksen.com> wrote:
> But, as long as it's any form of OSI open source it's fine, because they are
> plain data files for aggregation - there is no mixed binaries or linking
> involved.

My original questions about Voikko licensing got Warnocked, but
assuming that my reading (as expressed in this thread) is correct, it
seems that
 1) None of the obvious technical solutions for continued
Firefox/Voikko interop are ruled out by licensing.
 2) Mozilla most likely would not accept the GPLed dictionaries in the
locale-specific Firefox repacks (e.g. an install package that comes
with Finnish UI string language pack pre-bundled), but distributing
them via addons.mozilla.org should be fine (there are already GPLed
Hunspell dictionaries there).

(Questions buried at the end of the email.)

Currently, I'm thinking of the potential solution space like this
(note: at this time this is just me thinking out loud--no Mozilla
commitment implied):

1) Not doing anything.

Pros: No effort required!

Cons: Not a solution. Users of Finnish and Greenlandic spellchecking
in Firefox would lose it. Users of Firefox don't gain Sami
spellchecking.

2) Vendoring libvoikko and/or hfst-ospell into mozilla-central and
statically linking them into Gecko on desktop platforms. Extending the
dictionary Firefox extension type to be able to carry
Voikko/hfst-ospell dictionary data (currently these extensions carry
Hunspell data).

Pros:
* The same solution as with Hunspell.
* The least engineering effort.
* The fewest moving parts.
* One solution applies to all desktop platforms. (Android has spell
checking as part of the input method anyway.)
* Mozilla could push updates in case of critical bugs (as part of
Firefox itself).
* No need for Mozilla to instruct users to obtain unsandboxed native
executable code from non-operating system, non-mozilla.org
distribution points.

Cons:
* Gecko code size is increased for a reason that benefits relatively
very few users.
* Bugs in libvoikko/hfst-ospell could crash the Firefox UI process or
lead to remote code execution, since the UI process isn't sandboxed.

3) Mozilla building libvoikko and/or hfst-ospell as a shared object
and staging it for distribution on a CDN. Extending the dictionary
Firefox extension type to be able to carry Voikko/hfst-ospell
dictionary data. Using the OpenH264 download and update mechanism to
download the libvoikko/hfst-ospell shared object when when a
Voikko/hfst-ospell dictionary is installed.

Pros:
* No code size increase for users for whom libvoikko and/or
hfst-ospell isn't relevant.
* One solution applies to all desktop platforms. (Android has spell
checking as part of the input method anyway.)
* Mozilla could push updates in case of critical bugs (as part of
Firefox itself).
* No need for Mozilla to instruct users to obtain unsandboxed native
executable code from non-operating system, non-mozilla.org
distribution points.

Cons:
* Even though the infrastructure for downloading and updating shared
objects for Firefox to load at runtime already exists, experience
shows that adding another kind of downloaded component requires
attention and effort from front end and release engineering.
* In addition to front end and release engineering dependencies, that
would be more Gecko engineering involved that in scenario #2.
* Bugs in libvoikko/hfst-ospell could crash the Firefox UI process or
lead to remote code execution, since the UI process isn't sandboxed.

4) Mozilla engineering Gecko to dynamically load libvoikko/hfst-ospell
if it is found in the system library path. Leave locating dictionaries
up to libvoikko/hfst-ospell.

Pros:
* No code size increase for users for whom libvoikko and/or
hfst-ospell isn't relevant.
* No need for Mozilla to involve multiple teams to add a
Mozilla-managed downloadable.
* All desktop platforms would be potentially addressable even if the
user experience on some platforms would be worse than in the case of
Hunspell-supported languages.

Cons:
* Worse UX than with Hunspell.
* Bugs in libvoikko/hfst-ospell could crash the Firefox UI process or
lead to remote code execution, since the UI process isn't sandboxed.
* Mozilla couldn't push updates in case of critical bugs. At best, was
Firefox could refuse to use non-buggy old versions of the external
library.
* Mozilla would have instruct users (who don't get the library from
their Linux distro) to obtain and install third-party unsandboxed
native code.
* Someone would need to be willing to take the responsibility of
distributing the library for Windows/Mac (and possibly non-Debianish
Linux; e.g. Fedora appears to have dropped libvoikko).
* The party hosting the non-distro binaries for would need to go
through the effort of creating a trustworthy distribution point
(https, Authenticode).

5) Mozilla engineering Gecko to dynamically load libvoikko/hfst-ospell
if it is found in the system library path on Linux. Leave locating
dictionaries up to libvoikko/hfst-ospell. Using the system spell
checker on Windows 8+.

Pros:
* No code size increase for users for whom libvoikko and/or
hfst-ospell isn't relevant.
* No need for Mozilla to involve multiple teams to add a
Mozilla-managed downloadable.
* No need for non-operating system spell checking code distribution.

Cons:
* Mac and Windows 7 users lose Finnish spell checking. All but Linux
users lose Greenlandic spell checking (unless a pluggable back end for
Windows 8+ is developed separately from Firefox concerns).
* Platform-specific different solutions on the Firefox side.

6) Mozilla engineering a way for Web Extensions to provide a spell
checking engine. Developer(s) of Mozvoikko compiling the library into
Web Assembly and packaging it as a WebCam extension.

Pros:
* No user-visible paradigm shift from the present (but no improvement, either).
* Bugs in libvoikko/hfst-ospell would not be able to crash the Firefox
UI process or to cause on sandboxed execution of attacker-provided
native code.

Cons:
* Likely more engineering work on Mozilla side than in the scenarios
where in libvoikko/hfst-ospell would be used as native code in the UI
process.

- -

At this point, I'd like to understand if scenario #2 can be made
feasible. Can the code size impact be made smaller (reduce cons)? Can
the addressable audience be larger than what the present situation of
Finnish and Greenlandic having Firefox extensions suggests (increase
pros)?

For code size:

 * What's the purpose of the VFST vs. HFST distinction for Finnish vs.
everything else in libvoikko? Is VFST superior for Finnish and,
therefore, going to stay? Or is HFST more capable than VFST and,
therefore, a migration from VFST to HFST expected for Finnish? (Sorry
about basic questions like these. I have no clue about the underlying
tech.)

 * To what extent do the grammar checking and hyphenation functions
rely on the analysis code from spellchecking? That is, can substantial
code size reductions be expected from excluding grammar checking and
hyphenation from the build? (No disrespect for grammar checking or
hyphenation implied. It just happens that currently Firefox doesn't
support grammar checking at all and the hyphenation infrastructure in
Firefox already supports Finnish.)

For the addressable audience:

* Does there exist languages with very large numbers of users for
which a Hunspell dictionary cannot exist or for which a Hunspell
dictionary necessarily results in a poor user experience but for which
a VFST/HFST dictionary is likely to come into existence and would
yield a markedly better user experience?

* In the light of the previous question, what's the deal with Russian
showing up at https://gtsvn.uit.no/langtech/trunk/langs/rus/ ?
(Russian does appear to have a Hunspell dictionary.)

P.S. http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ talks
about GPLv3 rather than the Apache License 2.0. GitHub indicates that
LGPLv3 applies to the hfst repo while Apache License 2.0 applies to
hfst-ospell repo. It would be good to clarify this on helsinki.fi.
-- 
Henri Sivonen
hsivonen at hsivonen.fi
https://hsivonen.fi/


More information about the Libvoikko mailing list