[libvoikko] error handling in the grammar checker

Harri Pitkänen hatapitk at iki.fi
Tue Oct 1 21:42:05 EEST 2013


On Tuesday 01 October 2013 16:24:17 Francis Tyers wrote:
> I'd like to get ideas on how people think it would be best to deal with
> abstracting the grammar/error.hpp grammar/error.cpp code. At the moment
> there are some hardcoded messages for Finnish. For North Sámi, we have
> (so far) around 140 error tags: http://pastebin.com/YKGbCxzx

So one thing to keep in mind is that currently the API for libvoikko has these 
functions for dealing with the error codes:


int voikkoGetGrammarErrorCode(const struct VoikkoGrammarError * error);

const char * voikko_error_message_cstr(int error_code, const char * language);


Notably the second function that is used to get the human readable error 
description does not receive any (even indirect) reference to VoikkoHandle. 
Thus it cannot be used if the codes have different meaning for different 
grammar checker implementations.

I'm open to extending this by adding new functions and deprecating these two. 
We cannot remove them entirely but it is possible to maintain compatibility by 
allocating a single code to represent all implementation specific errors. It 
would need to have a fixed description such as "Language specific grammar 
error".

> Rather than just including these in the C++ code, I think it might be
> better to abstract out into a file which contains the error codes and
> messages. This could be XML, or tab separated or however.

The API above makes it essentially impossible to read the strings from a file 
at runtime. We return a "const char *" and allow the caller to expect that the 
pointer will always point to a valid string. The only way to do this without 
breaking anything would be to read the string and just let it leak...

But we could use such XML file during compilation to build a static data 
structure to hold the error codes and descriptions. This would definitely be 
an improvement over the current situation and would avoid changing the public 
API for now. I'm sure that it needs to be changed at some point though.

> One question is, which information would we like to have in the file ?
> 
> To deal with the Finnish we'd need at least something like:
> 
> <code>13</code>
> <descriptions>
>   <description xml::lang="fin"/>
>   <description xml::lang="eng"/>
> <description>
>
> We could also think about having long/short descriptions, and also of
> linking in the suggestions somehow.
> 
> Any thoughts on a nice file format for this ?

I think we need two codes: int for the current API and string for the new one:

<code>grm-wrong-case</code>
<legacyCode>6</legacyCode>
<descriptions>
  <shortDescription xml::lang="fi"/>
  <longDescription xml::lang="fi"/>
  <shortDescription xml::lang="en"/>
  <longDescription xml::lang="en"/>
</descriptions>


Harri



More information about the Libvoikko mailing list