[libvoikko] Test cases for libvoikko/HFST needed
Flammie Pirinen
flammie at iki.fi
Wed Jan 20 07:47:04 EET 2010
Harri Pitkänen kirjoitti 19.1.2010 kello 20.08:
> On Tuesday 19 January 2010, Flammie Pirinen wrote:
>> Ah yes, that's one thing that isn't entirely trivial, or at least,
>> ideal solution exceeds my C++ skills. Since HFST is just a bridge-
>> like
>> wrapper over underlying libraries, and currently it includes the
>> external libraries in source tree, and some of the definitions leak
>> to
>> public installed headers of hfst. Easy way out would be to fix
>> underlying libraries from using e.g. deprecated data structures in
>> their respective public interfaces, but I suppose there must be
>> something in the proper bridge etc. design patterns that do the
>> hiding
>> more elegantly without need to modify the external library code.
>
> I think the correct solution depends on what sort of applications
> are supposed
> to use this API. If the applications do not need to know anything
> about the
> underlying libraries you can just remove all the functionality that
> depends on
> SFST/OpenFST headers and stop including those headers.
Yes that has been my impression of the HFST and I hope there are not
any software that would use structures or functions of underlying
libraries directly. The reason I believe that just removing the
headers isn't possible is that public interface of hfst operates on
some structures or classes which have at least private members from
underlying libraries and necessitate inclusion of underlying
libraries' headers in public headers of hfst (I hope that makes sense,
I haven't been actively developing the library side of things myself).
> This is most certainly
> the case for libvoikko since we basically do only lookups and
> nothing else.
Hopefully for libvoikko as well as many other end applications we can
provide the lightweight lookup transducers with specialised code for
faster lookup, as Krister said in other mail. This will cut the size
of library to a fraction and since it's entirely our code then it will
not have licencing issues that may be problematic to some users.
> I have not studied these headers very carefully but it seems that
> the problem
> may be that HFST is not really providing an abstraction layer. It
> seems to
> equate weighted transducers with OpenFST and unweighted transducers
> with SFST
> and use the backend data types directly in public headers. Often
> such types
> can be replaced with pointers to incomplete types or abstract base
> classes.
Yes that is certainly current state of the things, only guarantee in
current library is that it provides almost same function signatures
for both back ends. Our svn contains a reformulation in object
oriented terms that provides framework for inclusion of more backends,
but it seemingly does not escape the requirement of including headers
of underlying libraries as implementation classes contain private
members of data structures from the backends.
>
>>> implement checking of correct
>>> capitalisation.
>>
>> Is it enough if implementers of morphologies are encouraged to make a
>> suggestion mechanism, which always prefers (initial) capitalisation
>> over anything else, given that the language in question contains
>> capitalisation of any form?
>>
>> Assuming the suggestion mechanism will
>> eventually be fast enough, it possibly won't give much advantage to
>> check capitalisation separately. Of course on user interface side it
>> should still be trivial to check if the capitalisation is first
>> suggestion in the list and inform user of appropriately.
>
> The advantage is significant at least with Malaga since we are now
> able to
> implement various modes for checking capitalisation while doing only
> one
> analysis operation per word.
Theoretically the different modes of capitalisation would either
require their own suggestion transducers or some short passage of code
allowing or skipping entries depending on some settings. Of course the
version where you capitalise first and test if that alone results in
correct spelling will be cheaper in not requiring different suggestion
relations (e.g. you might have edit distance plus initial
capitalisation and edit distance without caps as separate transducers)
nor handling the suggestions by c code.
>> Is there anything blocking debian packages of HFST?
>> [...]
>
> Probably nothing is blocking it. The included copies of SFST and
> OpenFST could
> be an issue for the distributions that care about such things.
Oh yes of course, bundling does certainly prevent the HFST ebuild
entering gentoo's main repository, I hadn't even thought of having it
outside the science repo. Hopefully if the library gains enough
importance there will be available experience for debundling the
backend libraries as well.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puimula.org/pipermail/libvoikko/attachments/20100120/49efce5d/attachment.html>
More information about the Libvoikko
mailing list