[libvoikko] grammar checker checks

Francis Tyers ftyers at prompsit.com
Thu Sep 19 11:33:08 EEST 2013


El dc 18 de 09 de 2013 a les 18:06 +0300, en/na Harri Pitkänen va
escriure:
> On Wednesday 18 September 2013 11:06:57 Francis Tyers wrote:
> > At the moment in the grammar checker there are two places with checks:
> > 
> > 1) checks.cpp
> > 2) check/
> 
> This is also an historical artifact. checks.cpp should be removed and each 
> check that is implemented there should be moved into its own C++ class under 
> check/ subdirectory. Of course you don't need to do all that, it is something 
> I should do once I find time.

Ok. To start off with I've just moved the checks.cpp file under checks/

> > What I would like to do is move these to a class, called
> > MalagaRuleEngine which extends the RuleEngine class. This class will be
> > used for containing all the rules/checks. I've started this, you can see
> > the attached diff. I would appreciate any comments. I have run the code
> > and it seems to work (the blue underlines come up in LibreOffice).
> 
> Looks good to me. However I would prefer if you named the legacy code as 
> FinnishRuleEngine instead of MalagaRuleEngine. There is almost nothing Malaga 
> dependent in there (even if it might seem like there is).
> 
> In fact I just checked that I can use the grammar checker with the 
> experimental Finnish VFST backend so there really is no dependency on Malaga. 
> We will disable the Malaga backend from default configuration perhaps next 
> year and may even remove it completely in a few years. The grammar checker 
> should not be affected by that.

Ok, FinnishRuleEngine works fine too.

> > 1) Is there a reason why there is no Makefile.am in grammar/ ?
> 
> The main reason is that it allows seeing in one place which source files are 
> included/excluded from build with a specific configuration switch. If each 
> directory had its own Makefile.am lots of the conditionals would need to be 
> duplicated. For example HFST specific backend files are located in three 
> different subdirectories (because there are four different HFST backends).

Ok, fair enough.

> > 2) Is there a reason why there is a check namespace ?
> 
> Just consistency. Programmers coming from Java or C# background will find it 
> more intuitive when subdirectories and programming language namespaces match.

Ah ok, I've never really used those languages.

> > 3) In the long term, might it be possible to replace the C++ checks with
> > a constraint grammar file ?
> 
> I really would like to do that at least for those checks that could be 
> implemented with a constraint grammar. But in the near future that does not 
> seem to be possible. Here the license of vislcg3 is a problem. Currently the 
> Finnish grammar checker under its MPL/GPL/LGPL tri-license could (in theory) 
> be integrated into LibreOffice core but vislcg3 can only be used in an 
> extension.

The authors of VISLCG3 would probably be quite happy to consider
trilicensing it. I put Tino Didriksen in copy, who is the main
developer.

I ask because many of the >100 line C++ files could be replaced with a
single line of CG.

> > 4) Do you have any test cases for the Finnish grammar checker ? -- At
> > the moment I'm just pasting in paragraphs and randomly removing words :)
> 
> We have a quite good integration test suite but unfortunately setting it up is 
> a bit complicated:
> 
>   https://github.com/voikko/corevoikko/wiki/libvoikko-IntegrationTesting
> 
> Since your commits will most likely not affect the Finnish checker very often 
> you could just try some of the test cases with command line voikkogc:
> 
> https://github.com/voikko/corevoikko/blob/master/tests/voikkotest/fi-x-malstd/grammar.txt
> 
> and then send me the patch. I will run the whole test suite for you before 
> checking it in.
> 
> By the way, you probably need commit access to the corevoikko repository for 
> your work. If you send me your GitHub user name I will add you into the 
> committers group. Are there others who would need to do commits there?

My github username is 'ftyers'. This is the first time I've used git --
I'm more used to SVN. Is there a way I can just work in a branch ? And
then we can merge in when necessary ? It's unlikely that I'll be working
outside of the src/grammar directory anyway.

Fran




More information about the Libvoikko mailing list