[libvoikko] Test cases for libvoikko/HFST needed

Harri Pitkänen hatapitk at iki.fi
Mon Jan 18 19:50:39 EET 2010


On Monday 18 January 2010, Sjur Moshagen wrote:
> Den 17. jan. 2010 kl. 19.53 skrev Flammie Pirinen:
> > Harri Pitkänen kirjoitti 17.1.2010 kello 19.19:
> >> In order to make sure that the work needed for supporting multiple
> >> languages
> >> is not priorised too high I would like to have actual dictionaries
> >> available
> >> for at least two languages. Finnish is of course already supported,
> >> so just
> >> one other language would be enough. What I need is
> >>
> >> - Morphology licensed fully under a free license. Quality does not
> >> matter
> >> much, but it should be actively maintained and either usable or at
> >> least
> >> slowly becoming usable.
> >
> > I hope that at some point of year I will be able to create tools to
> > compile traditional hunspell and maybe aspell or ispell dictionaries
> > to HFST transducers which could be ideal for testing. Waiting that I
> > think there should be some amount of lexc/twolc/xfst style
> > morphologies available. The sámi languages Francis mentioned are one
> > good resource.
> 
> Strongly supported, especially since we can provide comparisons with
>  already released spellers using a closed-source speller engine. Just to be
>  clear: it is only the speller engine that is closed source - all
>  linguistic source code relating to the Sámi analysers are licensed under
>  GPL.
> 
> As some of you probably are aware, the company behind the closed-source
>  speller engine went bankrupt last year, so having an open-source
>  alternative is rather important to us. We would be very happy to help out
>  and contribute whatever we can to make the Sámi Morph/HFST/LibVoikko 3
>  combo a successful one.

There are some things that would speed up the development where you could most 
likely help:

- Improve HFST public headers so that building libvoikko against HFST becomes 
possible without removing quality checks from our build system. It should be 
possible to include HFST headers in a compilation unit using
  g++ -Wall -Werror -pedantic

- Make sure that HFST can be built on Windows using MS Visual C++.

- Improve src/spellchecker/HfstSpeller.cpp to work with flag diacritics (Tommi 
said he will try to fix this) and implement checking of correct 
capitalisation.

- Write test cases for HfstSpeller.

- Provide Debian packages for HFST and Sámi morphology.


I will implement the new hyphenator interface and multiple language support 
for libvoikko and openoffice.org-voikko. It will take some time though because 
during the next few months I must spend more time on paid work than I 
originally planned to do.

Harri



More information about the Libvoikko mailing list