[libvoikko] grammar checking in libvoikko

Francis Tyers ftyers at prompsit.com
Tue Sep 17 17:32:00 EEST 2013


El dt 17 de 09 de 2013 a les 16:20 +0200, en/na Sjur Moshagen va
escriure:
> 17. sep. 2013 kl. 15:43 skrev Francis Tyers <ftyers at prompsit.com>:
> 
> > How would you recommend the data for the grammar checker be
> > distributed ? In a file like with .zhfst ? The checker needs two files,
> > (1) the descriptive morphological analyser, (2) the grammar checker
> > rules. The first will be an HFSTOL transducer, and the second a VISLCG3
> > binary format file. Do you have any preference for how it should be laid
> > out ? e.g. a zip file in ~/.voikko/ or something else ?
> 
> Please note that you may need more:
> 
> * a descriptive morphological analyser
> * a disambiguator/syntactic analyser adapted for GC purposes
> * an error identification module (also constraint grammar)
> * one or more modules to generate suggestions for corrections

Good point, I was assuming that 2 and 3 would be bundled together, e.g.
in the same CG file. 

The suggestions is another question... my preference is for the CG error
tags to be mapped to human readable descriptions in some TSV or XML
file. 

The suggestion module could be a bit more involved. For each error tag,
we could have a CG which fixes the analysis of the sentence, and then
generate the analysis using HFST. 

> What about user help texts for the errors identified? 

Help texts see above TSV or XML.

> And what about non-linguistic errors of the type wrong telephone number 
> formatting?

I'm not sure I'd put those in a grammar checker, especially as they vary
so wildly between countries. 

Fran




More information about the Libvoikko mailing list