[libvoikko] SuggestionGenerator interface

Harri Pitkänen hatapitk at iki.fi
Mon Apr 26 20:13:13 EEST 2010


On Monday 26 April 2010, Sjur Moshagen wrote:
> What is the status of the suggestion generator backend at the moment?

Technically it works, quality of the suggestions could still be improved.

> What I would hope for in the long(er) run, is an implementation where the
>  speller would support multiple such backends in parallell, and also
>  provide a user interface for the end user to choose among them,
>  potentially at runtime.

It is already possible to run two backends in parallel and switch between them 
at runtime. This feature was implemented to support specialized algorithms for 
correcting typed text and errors coming from optical character recognition 
software. HFST backend does not take advantage of it though.

>  Even run-time changes of settings for the main
>  speller vocabulary would be useful in the future. Here are some use-cases
>  from a Sámi point-of-view (although most of the points are rather
>  general):
> 
> * Some users would be more helped with a more liberal suggestion generator,
>  especially language learners and those who never learned to write. A more
>  liberal suggestion generator would be most useful if combined with
>  restrictions on the lexicons, such that free compounding and certain
>  inflections are excluded from the accepted language.
> 
> * professional writers would probably rather see a more restricted
>  suggestion generator
> 
> * dialectal variation also gives varying spelling errors, and it would be
>  useful to turn on or off certain suggestion types in accordance with this
>  variation, such that suggestions based on errors rarely made by writers
>  speaking a certain dialect are not generated, and the other way around
> 
> * free compounding as well as certain inflections (especially possessive
>  endings) have turned out to be problematic for pupils, language learners
>  and some other writers, since they tend to mask spelling errors of more
>  frequent word forms. In such cases it would be very helpful to turn off
>  these forms, or set a compounding limit (max 2 or whatever number of
>  constituents in a compound). This could be done either by flagging certain
>  tags (when using a full analysing transducer as speller) or by giving a
>  small weight to each of these, and adjusting an acceptance weight limit
>  (when using an optimised, weighted automata as speller).

Generalizing the choice between OCR and typing suggestions to something more 
versatile would indeed make sense. I'm not sure how much configurability is 
reasonable to export through the API since you can always edit the 
configuration file.

> It is of course possible to precompile certain common variants, and offer
>  them as alternatives during installation, but it would offer much greater
>  flexibility to the end user if we could provide an interactive user
>  interface for these changes to be applied at runtime.
> 
> There are of course applications in which it isn't possible to add such a
>  user interface. In those cases one could start with giving instructions on
>  how to edit a simple user preference text file, and later perhaps make a
>  small GUI application for editing the relevant settings.
> 
> Another reason to have dynamic settings instead of choosing speller
>  variants during installation is when there are multiple users of the same
>  machine. With only installation-time options, you essentially force the
>  users to choose some compromise variant, which would probably not be
>  optimal for any of the users.
> 
> There are of course many things be done both on the backend side and on the
>  user side (and in between) before the functionality above would be
>  implemented, but the first step is always to start discussing it:)

On the level of configuration files we already have the infrastructure needed 
for implementing arbitrary amount of settings for configuring the backends, as 
long as the settings can be represented as a single line of text. Currently we 
just don't use that.

User interfaces are harder. I can imagine adding a dropdown box to 
openoffice.org-voikko configuration dialog for choosing from a list of 
predefined algorithms. Anything more complicated would probably confuse users 
more than help them. In that case it would help to have a separate GUI 
application for advanced users that could be used to create the 
configurations.

Harri



More information about the Libvoikko mailing list