[libvoikko] Another language for voikko: Avar!
Harri Pitkänen
hatapitk at iki.fi
Mon Mar 10 19:05:21 EET 2014
On Monday 10 March 2014 11:17:18 Sjur Moshagen wrote:
> After a short discussion with Fran, here is what I suggest:
>
> * add support for another error model in the zhfst file, tentatively named
> errmodel.encoding.hfst
OK.
> * add a check box to the speller configuration
> dialog, to allow automatic corrections of encoding errors
OK.
> * if the check
> box is checked, when the text is run through the acceptor, every unaccepted
> string that can be automatically turned into an accepted string using this
> error model is automatically changed to that string; other errors are
> treated the usual way
OK.
> * if the check box is _not_ checked, behave as now,
> and let encoding errors be handled by the default error model
OK. Should it be checked by default? And more generally, should the old or new
behavior be the default for applications that do not know about this new
setting? For many applications we cannot provide a settings dialog at all.
> * if such an
> error model is not found, the check box is greyed out or otherwise not
> accessible/setable
This sounds like a minor detail but would in fact be quite hard to implement.
Currently the preferences for libreoffice-voikko are the same for all
languages while such error model might be available for only some of them.
> That is, using a special error model it should be possible to implement a
> safe autocorrect mode for encoding errors. Care has to be taken to ensure
> that the error model only generates one suggestion for each input.
>
> Does this sound like a viable option?
Yes, and if we are willing to accept that the check box is never grayed out it
should be relatively easy to implement.
> Also support for the OCR error model could be added at the same time (I
> believe multiple error models aren’t supported by the zhfst code now, don’t
> know whether this limitation is in the hfst-ospell or libvoikko code).
We could add this one as well but I believe these should still be independent
settings? Even with text produced by OCR software you might wish to choose
whether encoding errors should be corrected or not. I'm not really familiar
with OCR software and don't know if it is generally possible to force them to
only produce characters in a specific subset of Unicode.
The setting is currently ignored in libvoikko if HFST suggestion backend is
used.
Harri
More information about the Libvoikko
mailing list