[libvoikko] Test cases for libvoikko/HFST needed
Francis Tyers
ftyers at prompsit.com
Mon Jan 18 18:40:07 EET 2010
El dl 18 de 01 de 2010 a les 15:42 +0200, en/na Sjur Moshagen va
escriure:
> Den 17. jan. 2010 kl. 19.53 skrev Flammie Pirinen:
>
> > Harri Pitkänen kirjoitti 17.1.2010 kello 19.19:
> >
> >> In order to make sure that the work needed for supporting multiple
> >> languages
> >> is not priorised too high I would like to have actual dictionaries
> >> available
> >> for at least two languages. Finnish is of course already supported,
> >> so just
> >> one other language would be enough. What I need is
> >>
> >> - Morphology licensed fully under a free license. Quality does not
> >> matter
> >> much, but it should be actively maintained and either usable or at
> >> least
> >> slowly becoming usable.
> >
> > I hope that at some point of year I will be able to create tools to
> > compile traditional hunspell and maybe aspell or ispell dictionaries
> > to HFST transducers which could be ideal for testing. Waiting that I
> > think there should be some amount of lexc/twolc/xfst style
> > morphologies available. The sámi languages Francis mentioned are one
> > good resource.
>
> Strongly supported, especially since we can provide comparisons with already released spellers using a closed-source speller engine. Just to be clear: it is only the speller engine that is closed source - all linguistic source code relating to the Sámi analysers are licensed under GPL.
>
> As some of you probably are aware, the company behind the closed-source speller engine went bankrupt last year, so having an open-source alternative is rather important to us. We would be very happy to help out and contribute whatever we can to make the Sámi Morph/HFST/LibVoikko 3 combo a successful one.
>
> Also, under the svn checkout that Francis posted, you will find a number of other languages as well that either has or will soon have support for the HFST tools. The languages cover a number of different language families and typologies as well as various stages of development. That is: if you want to test LibVoikko 3 with a larger number of languages with other typologies and orthographical conventions, there should be a number of possibilities within that checkout :)
>
> Languages which support HFST at present:
> * all Sámi languages (6-7 languages today, 3 tested with HFST, one with Cyrillic orthography)
> * faroese
One clarification, the Faroese morphology is licensed under the GPL, but
not the lemma list, this is copyright -- although we continue to battle
to get it released under the GPL.
Fran
More information about the Libvoikko
mailing list