[libvoikko] Test cases for libvoikko/HFST needed

Harri Pitkänen hatapitk at iki.fi
Sun Jan 17 19:19:51 EET 2010


Development of libvoikko 3.0 will start soon. Support for languages other than 
Finnish is one of the goals for this release but there are others as well, 
such as better multi threading and extended hyphenator API.

In order to make sure that the work needed for supporting multiple languages 
is not priorised too high I would like to have actual dictionaries available 
for at least two languages. Finnish is of course already supported, so just 
one other language would be enough. What I need is

- Morphology licensed fully under a free license. Quality does not matter 
much, but it should be actively maintained and either usable or at least 
slowly becoming usable.

- Analyzer implementation for libvoikko. I guess current HfstAnalyzer will 
work with small changes. Separate speller implementation can be provided (or 
HfstSpeller can be used).

- A few automated test cases for ensuring that at least basic spell checking 
works correctly. This is because I may not understand the language at all. 
PyUnit tests are fine, something similar to

        def testSpell(self):
                self.failUnless(self.voikko.spell(u"määrä"))
                self.failIf(self.voikko.spell(u"määä"))

in python/libvoikkotests.py within libvoikko sources is enough.

Everything mentioned above, including all tools needed for building the 
morphology should be in such shape that it would be possible to build Debian 
packages for them and distribute them legally as free software. Unstable APIs 
are not a problem at this point unless the changes are happening daily.

Harri



More information about the Libvoikko mailing list