[libvoikko] SuggestionGenerator interface

Sjur Moshagen sjurnm at mac.com
Mon Apr 26 17:26:02 EEST 2010


Den 27. feb. 2010 kl. 19.58 skrev Harri Pitkänen:

> On Thursday 25 February 2010, Flammie Pirinen wrote:
>> 2010-02-25, Harri Pitkänen sanoi:
>>> Looks interesting. We should improve the suggestion generation code
>>> so that your suggestion mechanism could actually be plugged into use.
>>> Should not be too hard, most of the required abstractions are already
>>> there.
>> 
>> That would be ideal, or even further, as I think there were provision
>> for different suggestion algorithms in current voikko, same can be
>> easily done providing different transducers. I suppose at some point
>> the hard coded file names can be replaced by information read from
>> the .pro file or something?
> 
> Yes, that is possible.
> 
> I refactored the spelling suggestion code in SVN a bit. If you change your 
> HfstSuggestion class to extend 
> libvoikko::spellchecker::suggestion::SuggestionGenerator and modify 
> libvoikko::spellchecker::suggestion::SuggestionGeneratorFactory appropriately 
> it should now be possible to actually use the suggestions generated by 
> HfstSuggestion.
> 
> I have not yet implemented the configuration of suggestion generator backend 
> in the .pro file. It should be configured just the same way as Speller and 
> Analyzer are configured, but that does not work yet.

What is the status of the suggestion generator backend at the moment?

What I would hope for in the long(er) run, is an implementation where the speller would support multiple such backends in parallell, and also provide a user interface for the end user to choose among them, potentially at runtime. Even run-time changes of settings for the main speller vocabulary would be useful in the future. Here are some use-cases from a Sámi point-of-view (although most of the points are rather general):

* Some users would be more helped with a more liberal suggestion generator, especially language learners and those who never learned to write. A more liberal suggestion generator would be most useful if combined with restrictions on the lexicons, such that free compounding and certain inflections are excluded from the accepted language.

* professional writers would probably rather see a more restricted suggestion generator

* dialectal variation also gives varying spelling errors, and it would be useful to turn on or off certain suggestion types in accordance with this variation, such that suggestions based on errors rarely made by writers speaking a certain dialect are not generated, and the other way around

* free compounding as well as certain inflections (especially possessive endings) have turned out to be problematic for pupils, language learners and some other writers, since they tend to mask spelling errors of more frequent word forms. In such cases it would be very helpful to turn off these forms, or set a compounding limit (max 2 or whatever number of constituents in a compound). This could be done either by flagging certain tags (when using a full analysing transducer as speller) or by giving a small weight to each of these, and adjusting an acceptance weight limit (when using an optimised, weighted automata as speller).

It is of course possible to precompile certain common variants, and offer them as alternatives during installation, but it would offer much greater flexibility to the end user if we could provide an interactive user interface for these changes to be applied at runtime.

There are of course applications in which it isn't possible to add such a user interface. In those cases one could start with giving instructions on how to edit a simple user preference text file, and later perhaps make a small GUI application for editing the relevant settings.

Another reason to have dynamic settings instead of choosing speller variants during installation is when there are multiple users of the same machine. With only installation-time options, you essentially force the users to choose some compromise variant, which would probably not be optimal for any of the users.

There are of course many things be done both on the backend side and on the user side (and in between) before the functionality above would be implemented, but the first step is always to start discussing it:)

Best regards,
Sjur




More information about the Libvoikko mailing list