[libvoikko] Aligning development of hfst-based proofing tools

Sjur Moshagen sjurnm at mac.com
Mon Sep 20 14:30:38 EEST 2010


Hello all,

This is a followup on a discussion that started with a small suggestion to some people earlier this summer. After I received this and other feedback, I have now put together a first draft of what could become a specification for a speller lexicon file format for hfst-based spellers (see attached pdf file).

I have deliberately left out any mentioning of a hyphenator - in a transducer context, it is cleaner and easier to handle that as a separate entity, with its own interface or file format specification.

I have also so far only specified the lexicon file format, and said nothing about api-level conventions. This is partly to start with something quite restricted, and see how this develops, partly because I'm not sure we need much more at the moment to ensure that the *speller lexicons* are compatible across implementations. That is, the first goal is to ensure that I can take my speller lexicon file (my linguistic assets so to speak), and move it to another implementation, and be confident that it will work with default features without further modifications.

This will allow me as a linguist to concentrate on what I know, and rest assured that my investments will not be lost if I need to switch to another implementation of hfst-based spellers as long as they follow this specification.

Feedback on any part of this specification, and suggestions for expansions, additions etc are very welcome.

The source xml file (which renders quite ok in modern web browsers) is available at our svn repository:

https://victorio.uit.no/langtech/trunk/techdoc/proofdoc/spell/hfst/lexfile-spec.xml

(also patches agains this file are welcome.)

Den 1. jul. 2010 kl. 17.22 skrev Harri Pitkänen:

> Hi!
> 
> On Wednesday 30 June 2010, Sjur Moshagen wrote:
>> What I would suggest is that we do some cooperative work to align certain
>> interfaces, such that the transducers produced for one proofing engine
>> (say, hfst+voikko) would be automatically usable also by the other runtime
>> proofing environment. I would also suggest that we make this interface a
>> public specification, such that other parties, when writing their own (or
>> extening existing) proofing runtime for hfst-based transducers, can stay
>> compatible.
> 
> I support this. We have not yet agreed on any final HFST interface for Voikko 
> so for us there is plenty of time to come up with such specification. But if 
> one is created I hope it can be made stable enough so that the transducer 
> interfaces would not need to be changed too often. One new revision in a year 
> would be fine.

That sounds like a good principle.

Best regards,
Sjur N. Moshagen
Samediggi · Sametinget
Project Manager for the Divvun project
http://www.divvun.no/
http://www.samediggi.no/
+358-9-49 75 29 (w)
+358-505 634 319 (m)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: lexfile-spec.pdf
Type: application/pdf
Size: 76557 bytes
Desc: not available
URL: <http://lists.puimula.org/pipermail/libvoikko/attachments/20100920/8568e963/attachment.pdf>
-------------- next part --------------




More information about the Libvoikko mailing list