[libvoikko] Test cases for libvoikko/HFST needed

Flammie Pirinen flammie at iki.fi
Tue Jan 19 08:38:17 EET 2010

Harri Pitkänen kirjoitti 18.1.2010 kello 19.50:
> There are some things that would speed up the development where you  
> could most
> likely help:
> - Improve HFST public headers so that building libvoikko against  
> HFST becomes
> possible without removing quality checks from our build system. It  
> should be
> possible to include HFST headers in a compilation unit using
>  g++ -Wall -Werror -pedantic

Ah yes, that's one thing that isn't entirely trivial, or at least,  
ideal solution exceeds my C++ skills. Since HFST is just a bridge-like  
wrapper over underlying libraries, and currently it includes the  
external libraries in source tree, and some of the definitions leak to  
public installed headers of hfst. Easy way out would be to fix  
underlying libraries from using e.g. deprecated data structures in  
their respective public interfaces, but I suppose there must be  
something in the proper bridge etc. design patterns that do the hiding  
more elegantly without need to modify the external library code.

> - Make sure that HFST can be built on Windows using MS Visual C++.

Yes, that's definetely up for grabs for anyone who has experience with  
the environment and possesses one. I tend to steer clear of Microsoft  
products if at all possible and I think even my colleague who may  
implement windows support will limit it to ming/cygwin. As far as I  
know we don't even have licences to visual studio programs here.

> - Improve src/spellchecker/HfstSpeller.cpp to work with flag  
> diacritics (Tommi
> said he will try to fix this) and

Yes that shouldn't take long if I just take time to implement it, I've  
been tinkering with omorfi's hyphenation and finite state suggestion  
mechanism implementation a bit lately.

> implement checking of correct
> capitalisation.

Is it enough if implementers of morphologies are encouraged to make a  
suggestion mechanism, which always prefers (initial) capitalisation  
over anything else, given that the language in question contains  
capitalisation of any form? Assuming the suggestion mechanism will  
eventually be fast enough, it possibly won't give much advantage to  
check capitalisation separately. Of course on user interface side it  
should still be trivial to check if the capitalisation is first  
suggestion in the list and inform user of appropriately.

One reason for this question is also that capitalisation of course has  
a few language dependent cases. E.g. i in turkish, ij in dutch, ss in  
german and so forth. Also I'm not sure but I think some language may  
have more complex capitalisation rules than word initial?

> - Provide Debian packages for HFST and Sámi morphology.

Is there anything blocking debian packages of HFST? It uses mostly  
standard autotools setup and dependencies I believe are documented in  
README's. At least gentoo packaging went nicely in with defaults: <http://git.overlays.gentoo.org/gitweb/?p=proj/sci.git;a=blob;f=sci-misc/hfst/hfst-2.2.ebuild;h=a688b29d682bc321b489faf0c0d399974df72151;hb=HEAD 

More information about the Libvoikko mailing list