[libvoikko] Test cases for libvoikko/HFST needed
Flammie Pirinen
flammie at iki.fi
Tue Jan 19 08:38:17 EET 2010
Harri Pitkänen kirjoitti 18.1.2010 kello 19.50:
>
> There are some things that would speed up the development where you
> could most
> likely help:
>
> - Improve HFST public headers so that building libvoikko against
> HFST becomes
> possible without removing quality checks from our build system. It
> should be
> possible to include HFST headers in a compilation unit using
> g++ -Wall -Werror -pedantic
Ah yes, that's one thing that isn't entirely trivial, or at least,
ideal solution exceeds my C++ skills. Since HFST is just a bridge-like
wrapper over underlying libraries, and currently it includes the
external libraries in source tree, and some of the definitions leak to
public installed headers of hfst. Easy way out would be to fix
underlying libraries from using e.g. deprecated data structures in
their respective public interfaces, but I suppose there must be
something in the proper bridge etc. design patterns that do the hiding
more elegantly without need to modify the external library code.
> - Make sure that HFST can be built on Windows using MS Visual C++.
Yes, that's definetely up for grabs for anyone who has experience with
the environment and possesses one. I tend to steer clear of Microsoft
products if at all possible and I think even my colleague who may
implement windows support will limit it to ming/cygwin. As far as I
know we don't even have licences to visual studio programs here.
> - Improve src/spellchecker/HfstSpeller.cpp to work with flag
> diacritics (Tommi
> said he will try to fix this) and
Yes that shouldn't take long if I just take time to implement it, I've
been tinkering with omorfi's hyphenation and finite state suggestion
mechanism implementation a bit lately.
> implement checking of correct
> capitalisation.
Is it enough if implementers of morphologies are encouraged to make a
suggestion mechanism, which always prefers (initial) capitalisation
over anything else, given that the language in question contains
capitalisation of any form? Assuming the suggestion mechanism will
eventually be fast enough, it possibly won't give much advantage to
check capitalisation separately. Of course on user interface side it
should still be trivial to check if the capitalisation is first
suggestion in the list and inform user of appropriately.
One reason for this question is also that capitalisation of course has
a few language dependent cases. E.g. i in turkish, ij in dutch, ss in
german and so forth. Also I'm not sure but I think some language may
have more complex capitalisation rules than word initial?
> - Provide Debian packages for HFST and Sámi morphology.
Is there anything blocking debian packages of HFST? It uses mostly
standard autotools setup and dependencies I believe are documented in
README's. At least gentoo packaging went nicely in with defaults: <http://git.overlays.gentoo.org/gitweb/?p=proj/sci.git;a=blob;f=sci-misc/hfst/hfst-2.2.ebuild;h=a688b29d682bc321b489faf0c0d399974df72151;hb=HEAD
>
More information about the Libvoikko
mailing list