[libvoikko] grammar checker checks
Harri Pitkänen
hatapitk at iki.fi
Wed Sep 18 18:06:42 EEST 2013
On Wednesday 18 September 2013 11:06:57 Francis Tyers wrote:
> At the moment in the grammar checker there are two places with checks:
>
> 1) checks.cpp
> 2) check/
This is also an historical artifact. checks.cpp should be removed and each
check that is implemented there should be moved into its own C++ class under
check/ subdirectory. Of course you don't need to do all that, it is something
I should do once I find time.
> What I would like to do is move these to a class, called
> MalagaRuleEngine which extends the RuleEngine class. This class will be
> used for containing all the rules/checks. I've started this, you can see
> the attached diff. I would appreciate any comments. I have run the code
> and it seems to work (the blue underlines come up in LibreOffice).
Looks good to me. However I would prefer if you named the legacy code as
FinnishRuleEngine instead of MalagaRuleEngine. There is almost nothing Malaga
dependent in there (even if it might seem like there is).
In fact I just checked that I can use the grammar checker with the
experimental Finnish VFST backend so there really is no dependency on Malaga.
We will disable the Malaga backend from default configuration perhaps next
year and may even remove it completely in a few years. The grammar checker
should not be affected by that.
> 1) Is there a reason why there is no Makefile.am in grammar/ ?
The main reason is that it allows seeing in one place which source files are
included/excluded from build with a specific configuration switch. If each
directory had its own Makefile.am lots of the conditionals would need to be
duplicated. For example HFST specific backend files are located in three
different subdirectories (because there are four different HFST backends).
> 2) Is there a reason why there is a check namespace ?
Just consistency. Programmers coming from Java or C# background will find it
more intuitive when subdirectories and programming language namespaces match.
> 3) In the long term, might it be possible to replace the C++ checks with
> a constraint grammar file ?
I really would like to do that at least for those checks that could be
implemented with a constraint grammar. But in the near future that does not
seem to be possible. Here the license of vislcg3 is a problem. Currently the
Finnish grammar checker under its MPL/GPL/LGPL tri-license could (in theory)
be integrated into LibreOffice core but vislcg3 can only be used in an
extension.
> 4) Do you have any test cases for the Finnish grammar checker ? -- At
> the moment I'm just pasting in paragraphs and randomly removing words :)
We have a quite good integration test suite but unfortunately setting it up is
a bit complicated:
https://github.com/voikko/corevoikko/wiki/libvoikko-IntegrationTesting
Since your commits will most likely not affect the Finnish checker very often
you could just try some of the test cases with command line voikkogc:
https://github.com/voikko/corevoikko/blob/master/tests/voikkotest/fi-x-malstd/grammar.txt
and then send me the patch. I will run the whole test suite for you before
checking it in.
By the way, you probably need commit access to the corevoikko repository for
your work. If you send me your GitHub user name I will add you into the
committers group. Are there others who would need to do commits there?
Harri
More information about the Libvoikko
mailing list