[libvoikko] grammar checking in libvoikko

Francis Tyers ftyers at prompsit.com
Tue Sep 17 16:43:55 EEST 2013


El dl 16 de 09 de 2013 a les 18:26 +0300, en/na Harri Pitkänen va
escriure:
> Hi!
> 
> On Monday 16 September 2013 11:45:31 Francis Tyers wrote:
> > * Is the full analysis of HFST morphologies available through libvoikko,
> > or do we just have access to the surface string ?
> 
> It used to be but the feature was removed when support for ZHFST spellers was 
> added. This was because we dropped the dependency on hfst-optimized-lookup and 
> started to use hfstospell instead.
> 
> HfstAnalyzer is still there (but the implementation is essentially empty) so 
> you should be able to resurrect the feature from Git history quite easily:
> 
>   https://github.com/voikko/corevoikko/blob/master/libvoikko/src/morphology/HfstAnalyzer.cpp

How would you recommend the data for the grammar checker be
distributed ? In a file like with .zhfst ? The checker needs two files,
(1) the descriptive morphological analyser, (2) the grammar checker
rules. The first will be an HFSTOL transducer, and the second a VISLCG3
binary format file. Do you have any preference for how it should be laid
out ? e.g. a zip file in ~/.voikko/ or something else ?

> > * I see there is some existing work in libvoikko/src/grammar/ -- could /
> > should this infrastructure be reused ?
> 
> You need to plug in your grammar checker by replacing lines 84 - 134 with your 
> checker in this source file:
> 
>   https://github.com/voikko/corevoikko/blob/master/libvoikko/src/grammar/cache.cpp
> 
> "const wchar_t * text" contains the text to be checked. For each error you 
> find you create a CacheEntry object and call gc_cache_append_error to push the 
> error into the cache.
> 
> To do all this without breaking the existing code we need to build an abstract 
> GrammarChecker superclass and extend it with two subclasses, one for the 
> existing implementation and another for your new implementation. The exactly 
> same has been done with Analyzer, SpellChecker and others. I can help you with 
> that and some other small things that will be needed such as changes to the 
> LibreOffice plugin.

Great, thanks! When you say lines 84-134 do you mean this method:

void gc_paragraph_to_cache(voikko_options_t * voikkoOptions, const
wchar_t * text, size_t textlen) {

As far as I can see, I need to replace:

analysis.cpp : gc_analyze_paragraph
               gc_analyze_sentence
               gc_analyze_token

with methods that use the HFST optimised lookup library to analyse
individual words. Actually, probably only gc_analyze_token.

Then I need to replace: 

cache.cpp : gc_paragraph_to_cache

with a method that takes the sentences with analyses from HFST and
passes each one through the CG and collects the error tags.

?

> > * Implementing this will involve adding a dependency on vislcg3 -- what
> > is the best way that this could be done ?
> 
> I'm not very familiar with vislcg3. If it is a well behaving library you 
> should be able to to just dynamically link with it, just as we do with 
> hfstospell:
> 
>   https://github.com/voikko/corevoikko/blob/master/libvoikko/configure.ac
> 
> The license appears to be GPLv3+. That is OK for an optional dependency. So 
> you need to add a compile time conditional (disabled by default) to enable 
> this new feature.

It seems to be fairly well behaved, some work needs to be done on the
library end, and adding pkgconfig, but in principle it should be able to
be made to work just like hfst-ospell.

Fran




More information about the Libvoikko mailing list