[libvoikko] HFST backend is no longer experimental
Harri Pitkänen
hatapitk at iki.fi
Mon Mar 18 19:49:49 EET 2013
On Monday 18 March 2013, Sjur Moshagen wrote:
> Den 18. mar 2013 kl. 17:28 skrev Harri Pitkänen:
> > So there seems to be two issues:
> >
> > - Suggestions are not sorted as they should. It looks like libvoikko uses
> > the ospell library in a way that ignores the weights. I'll fix that.
>
> Ok.
I have fixed this now.
> My assumption is that the case handling & mapping of libvoikko is fast and
> reliable (also across script systems), so I would suggest that we assume
> that the fst's only contain canonical case, and nothing else. This should
> result in smaller and faster fst's.
>
> Given this assumption, the bug is actually in the fst, and not in the code
> of either libvoikko nor hfst-ospell. It should be easy to fix, though.
>
> WDYT?
Sounds good to me. It should work if
- libvoikko knows about case mappings for the language (I think it does for
the languages that are being worked on)
- the language allows all (or at least those that matter) of the words to be
written in these three (but no other) forms:
* in canonical case
* initial letter capitalized, other letters in canonical case
* all letters capitalized
- error model is able to produce the necessary character case corrections so
that capitalized first letter is suggested when it is necessary to
capitalize the first letter.
I think it would be good to document this in the file format specification to
avoid confusion:
http://www.divvun.no/no/future/proofing/lexfile-spec.html
Harri
More information about the Libvoikko
mailing list