[libvoikko] Another language for voikko: Avar!

Flammie Pirinen flammie at iki.fi
Wed Mar 12 16:28:03 EET 2014


2014-03-11, Francis Tyers sanoi:

> El dl 10 de 03 de 2014 a les 19:05 +0200, en/na Harri Pitkänen va
> escriure:
> > On Monday 10 March 2014 11:17:18 Sjur Moshagen wrote:

> > > That is, using a special error model it should be possible to
> > > implement a safe autocorrect mode for encoding errors. Care has
> > > to be taken to ensure that the error model only generates one
> > > suggestion for each input.
> 
> This might be difficult. Perhaps it would be better to just fix it if
> there is only one suggestion ? e.g. skip in cases of ambiguity.

I think this sounds like the best solution. There's a lot of things we
expect wrt sanity of finite-state models anyways that cannot be checked
without doing something equally demanding as just looking up for
results. So using one result if there's exactly one is probably best
option. I can imagine there are encodings that create real ambiguity
too but then they must go throug regular error correction instead, this
feature would be for really minor and straightforward cases.


On Monday 10 March 2014 11:17:18 Sjur Moshagen wrote:

> After a short discussion with Fran, here is what I suggest:
> 
> * add support for another error model in the zhfst file, tentatively
> named errmodel.encoding.hfst
> * add a check box to the speller configuration dialog, to allow
> automatic corrections of encoding errors
> * if the check box is checked, when the text is run through the
> acceptor, every unaccepted string that can be automatically turned
> into an accepted string using this error model is automatically
> changed to that string; other errors are treated the usual way
> * if the check box is _not_ checked, behave as now, and let encoding
> errors be handled by the default error model
> * if such an error model is not found, the check box is greyed out or
> otherwise not accessible/setable


Is there anything to be done to hfstospell api for these features? I
haven't had time to update it for the analysers so still doing these in
one batch would be a good thing. Basically the data structures in
ZHfstOspeller kind of have always supported these multiple automata and
all but there's perhaps not enough functionality to use them in real
app context with UI yet. I will hopefully get back to stable internet
connection and a bit of time in next couple of days so I can perform
some of these changes.

-- 
Flammie, computer scientist bachelor, linguist master, free software
Finnish localiser, and more! <http://www.iki.fi/flammie/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.puimula.org/pipermail/libvoikko/attachments/20140312/29205959/attachment.sig>


More information about the Libvoikko mailing list