[libvoikko] grammar checker checks
Francis Tyers
ftyers at prompsit.com
Thu Sep 19 11:33:08 EEST 2013
El dc 18 de 09 de 2013 a les 18:06 +0300, en/na Harri Pitkänen va
escriure:
> On Wednesday 18 September 2013 11:06:57 Francis Tyers wrote:
> > At the moment in the grammar checker there are two places with checks:
> >
> > 1) checks.cpp
> > 2) check/
>
> This is also an historical artifact. checks.cpp should be removed and each
> check that is implemented there should be moved into its own C++ class under
> check/ subdirectory. Of course you don't need to do all that, it is something
> I should do once I find time.
Ok. To start off with I've just moved the checks.cpp file under checks/
> > What I would like to do is move these to a class, called
> > MalagaRuleEngine which extends the RuleEngine class. This class will be
> > used for containing all the rules/checks. I've started this, you can see
> > the attached diff. I would appreciate any comments. I have run the code
> > and it seems to work (the blue underlines come up in LibreOffice).
>
> Looks good to me. However I would prefer if you named the legacy code as
> FinnishRuleEngine instead of MalagaRuleEngine. There is almost nothing Malaga
> dependent in there (even if it might seem like there is).
>
> In fact I just checked that I can use the grammar checker with the
> experimental Finnish VFST backend so there really is no dependency on Malaga.
> We will disable the Malaga backend from default configuration perhaps next
> year and may even remove it completely in a few years. The grammar checker
> should not be affected by that.
Ok, FinnishRuleEngine works fine too.
> > 1) Is there a reason why there is no Makefile.am in grammar/ ?
>
> The main reason is that it allows seeing in one place which source files are
> included/excluded from build with a specific configuration switch. If each
> directory had its own Makefile.am lots of the conditionals would need to be
> duplicated. For example HFST specific backend files are located in three
> different subdirectories (because there are four different HFST backends).
Ok, fair enough.
> > 2) Is there a reason why there is a check namespace ?
>
> Just consistency. Programmers coming from Java or C# background will find it
> more intuitive when subdirectories and programming language namespaces match.
Ah ok, I've never really used those languages.
> > 3) In the long term, might it be possible to replace the C++ checks with
> > a constraint grammar file ?
>
> I really would like to do that at least for those checks that could be
> implemented with a constraint grammar. But in the near future that does not
> seem to be possible. Here the license of vislcg3 is a problem. Currently the
> Finnish grammar checker under its MPL/GPL/LGPL tri-license could (in theory)
> be integrated into LibreOffice core but vislcg3 can only be used in an
> extension.
The authors of VISLCG3 would probably be quite happy to consider
trilicensing it. I put Tino Didriksen in copy, who is the main
developer.
I ask because many of the >100 line C++ files could be replaced with a
single line of CG.
> > 4) Do you have any test cases for the Finnish grammar checker ? -- At
> > the moment I'm just pasting in paragraphs and randomly removing words :)
>
> We have a quite good integration test suite but unfortunately setting it up is
> a bit complicated:
>
> https://github.com/voikko/corevoikko/wiki/libvoikko-IntegrationTesting
>
> Since your commits will most likely not affect the Finnish checker very often
> you could just try some of the test cases with command line voikkogc:
>
> https://github.com/voikko/corevoikko/blob/master/tests/voikkotest/fi-x-malstd/grammar.txt
>
> and then send me the patch. I will run the whole test suite for you before
> checking it in.
>
> By the way, you probably need commit access to the corevoikko repository for
> your work. If you send me your GitHub user name I will add you into the
> committers group. Are there others who would need to do commits there?
My github username is 'ftyers'. This is the first time I've used git --
I'm more used to SVN. Is there a way I can just work in a branch ? And
then we can merge in when necessary ? It's unlikely that I'll be working
outside of the src/grammar directory anyway.
Fran
More information about the Libvoikko
mailing list