[libvoikko] HFST backend is no longer experimental

Harri Pitkänen hatapitk at iki.fi
Mon Mar 18 19:49:49 EET 2013


On Monday 18 March 2013, Sjur Moshagen wrote:
> Den 18. mar 2013 kl. 17:28 skrev Harri Pitkänen:
> > So there seems to be two issues:
> > 
> > - Suggestions are not sorted as they should. It looks like libvoikko uses
> > the ospell library in a way that ignores the weights. I'll fix that.
> 
> Ok.

I have fixed this now.

> My assumption is that the case handling & mapping of libvoikko is fast and
> reliable (also across script systems), so I would suggest that we assume
> that the fst's only contain canonical case, and nothing else. This should
> result in smaller and faster fst's.
> 
> Given this assumption, the bug is actually in the fst, and not in the code
> of either libvoikko nor hfst-ospell. It should be easy to fix, though.
> 
> WDYT?

Sounds good to me. It should work if

 - libvoikko knows about case mappings for the language (I think it does for
   the languages that are being worked on)
 - the language allows all (or at least those that matter) of the words to be
   written in these three (but no other) forms:
   * in canonical case
   * initial letter capitalized, other letters in canonical case
   * all letters capitalized
 - error model is able to produce the necessary character case corrections so
   that capitalized first letter is suggested when it is necessary to
   capitalize the first letter.

I think it would be good to document this in the file format specification to 
avoid confusion:

  http://www.divvun.no/no/future/proofing/lexfile-spec.html

Harri



More information about the Libvoikko mailing list