[libvoikko] SuggestionGenerator interface

Flammie Pirinen flammie at iki.fi
Tue Apr 27 10:14:04 EEST 2010


2010-04-26, Harri Pitkänen sanoi:

> On Monday 26 April 2010, Sjur Moshagen wrote:
> > What is the status of the suggestion generator backend at the
> > moment?
> 
> Technically it works, quality of the suggestions could still be
> improved.

For HFST side the nice thing is, that suggestion algorithm logic is
offloaded to the transducers (and as such, to makers of the
dictionaries / language models), so improving suggestions 99 % of time
will mean creating better error model transducer and tossing that to
sug.hwfst.

> > What I would hope for in the long(er) run, is an implementation
> > where the speller would support multiple such backends in
> > parallell, and also provide a user interface for the end user to
> > choose among them, potentially at runtime.
> 
> It is already possible to run two backends in parallel and switch
> between them at runtime. This feature was implemented to support
> specialized algorithms for correcting typed text and errors coming
> from optical character recognition software. HFST backend does not
> take advantage of it though.

That's one thing I suppose would be nice to have thought out at some
point, while it is entirely possible to now hardcode all sorts of files
like ocr-sug.hwfst, qwerty-sug.hwfst and so forth, it may be worthwhile
to make it possible for writers of dictionaries to just have arbitrary
amount of suggestion engines installed alongside the dictionaries or
something.

> On the level of configuration files we already have the
> infrastructure needed for implementing arbitrary amount of settings
> for configuring the backends, as long as the settings can be
> represented as a single line of text. Currently we just don't use
> that.

So this is probably something I should be implementing next for HFST
backend. I suppose it doesn't need to be more complex than allowing
list of filenames for settings of each dictionary and algorithm type.

> User interfaces are harder. I can imagine adding a dropdown box to 
> openoffice.org-voikko configuration dialog for choosing from a list
> of predefined algorithms. Anything more complicated would probably
> confuse users more than help them. In that case it would help to have
> a separate GUI application for advanced users that could be used to
> create the configurations.

I could imagine primary users of different algorithms and dictionaries
that can be offered via configurations are power users like us and the
distribution developers, both of whom can handle editing text files
without specific GUI applications, although having one will never hurt
either. For other end users the list of carefully selected and verified
options should be indeed enough, mostly we wouldn't want e.g. oo.o to
show long list of different names for edit distance variants or
different backends to handle dictionaries.

Now in case of multilingual voikko, it might be reasonable to allow
makers of dictionaries to specify which subset of the selected
algorithms or options is implemented and/or makes sense to have as an
option?

-- 
Flammie, computer scientist bachelor, linguist master, free software
Finnish localiser, and more! <http://www.iki.fi/flammie/>



More information about the Libvoikko mailing list