[libvoikko] grammar checker checks

Harri Pitkänen hatapitk at iki.fi
Wed Sep 18 18:06:42 EEST 2013


On Wednesday 18 September 2013 11:06:57 Francis Tyers wrote:
> At the moment in the grammar checker there are two places with checks:
> 
> 1) checks.cpp
> 2) check/

This is also an historical artifact. checks.cpp should be removed and each 
check that is implemented there should be moved into its own C++ class under 
check/ subdirectory. Of course you don't need to do all that, it is something 
I should do once I find time.

> What I would like to do is move these to a class, called
> MalagaRuleEngine which extends the RuleEngine class. This class will be
> used for containing all the rules/checks. I've started this, you can see
> the attached diff. I would appreciate any comments. I have run the code
> and it seems to work (the blue underlines come up in LibreOffice).

Looks good to me. However I would prefer if you named the legacy code as 
FinnishRuleEngine instead of MalagaRuleEngine. There is almost nothing Malaga 
dependent in there (even if it might seem like there is).

In fact I just checked that I can use the grammar checker with the 
experimental Finnish VFST backend so there really is no dependency on Malaga. 
We will disable the Malaga backend from default configuration perhaps next 
year and may even remove it completely in a few years. The grammar checker 
should not be affected by that.

> 1) Is there a reason why there is no Makefile.am in grammar/ ?

The main reason is that it allows seeing in one place which source files are 
included/excluded from build with a specific configuration switch. If each 
directory had its own Makefile.am lots of the conditionals would need to be 
duplicated. For example HFST specific backend files are located in three 
different subdirectories (because there are four different HFST backends).

> 2) Is there a reason why there is a check namespace ?

Just consistency. Programmers coming from Java or C# background will find it 
more intuitive when subdirectories and programming language namespaces match.

> 3) In the long term, might it be possible to replace the C++ checks with
> a constraint grammar file ?

I really would like to do that at least for those checks that could be 
implemented with a constraint grammar. But in the near future that does not 
seem to be possible. Here the license of vislcg3 is a problem. Currently the 
Finnish grammar checker under its MPL/GPL/LGPL tri-license could (in theory) 
be integrated into LibreOffice core but vislcg3 can only be used in an 
extension.

> 4) Do you have any test cases for the Finnish grammar checker ? -- At
> the moment I'm just pasting in paragraphs and randomly removing words :)

We have a quite good integration test suite but unfortunately setting it up is 
a bit complicated:

  https://github.com/voikko/corevoikko/wiki/libvoikko-IntegrationTesting

Since your commits will most likely not affect the Finnish checker very often 
you could just try some of the test cases with command line voikkogc:

https://github.com/voikko/corevoikko/blob/master/tests/voikkotest/fi-x-malstd/grammar.txt

and then send me the patch. I will run the whole test suite for you before 
checking it in.

By the way, you probably need commit access to the corevoikko repository for 
your work. If you send me your GitHub user name I will add you into the 
committers group. Are there others who would need to do commits there?

Harri



More information about the Libvoikko mailing list