[libvoikko] Libvoikko compiled for Android

Flammie Pirinen flammie at iki.fi
Mon Aug 13 23:47:49 EEST 2012


2012-08-13, Harri Pitkänen sanoi:

> On Monday 13 August 2012, Harri Pitkänen wrote:
> > On Monday 13 August 2012, Flammie Pirinen wrote:
> > > Also, I think it should also be possible to just upgrade zhfst's
> > > into vfst's for android and work with that. 
> > 
> > This would be an option if the speller and error model were
> > unweighted and otherwise "small" enough to fit the size constraints
> > for VFST transducers. But it looks like zhfst error models are
> > weighted and weights are actually used (at least in the year old
> > sme error model I have here).
> > 
> > So we would have no way of sorting the suggestions, other than that
> > this could work easily.

This is a fair point actually. Even though a bit of effort has been put
in sorting the suggestions with these systems, average pc office user
needs this feature the least, they mostly need the red underlines and
very rarely few reasonable suggestions. It is much more necessary
function for most android based devices, where typing is often harder.

> Weights would be implemented by converting weights other than one
> into special epsilon symbols and a weighted transition will be
> replaced with two transitions: weight_transition ->
> original_transition. This is of course inefficient if there are lots
> of transitions with weights other than 1 but should work nicely if
> such transitions are relatively rare. And you cannot have very many
> different weight values in use so some approximation will be needed
> if original transducer is in a format that uses floating point
> weights with high precision.

Most of the less resourced languages that we deal with will have
weights of multiples of one like that, usually something similar as
voikko has now
<https://github.com/voikko/corevoikko/blob/master/libvoikko/doc/oikoluku-korjausehdotukset.txt>.
I think it should be doable with a handful of weights.

The other languages won't be that different in practice, since they are
just using probabilities of some factors, mostly word forms. They can
be arranged into nice classes rather easily as well, as there will be
some clear maximal and minimal weight in the automaton.

In the end both weighting schemes should be converted to ones or some
small amount of classes by very simple algorithms.

The r2608 of trunk hfst-ospell should have working --enable-zhfst and
--enable-xml switches.

-- 
Flammie, computer scientist bachelor, linguist master, free software
Finnish localiser, and more! <http://www.iki.fi/flammie/>



More information about the Libvoikko mailing list