[libvoikko] Aligning development of hfst-based proofing tools

Flammie Pirinen flammie at iki.fi
Tue Sep 28 06:47:39 EEST 2010


2010-09-24, Sjur Moshagen sanoi:

> Regarding whether hyphenation should be in the same or a separate
> archive is an open question. Some implementations basically use one
> and the same lexicon for encoding both the accepted language and the
> hyphenation patterns, whereas others separate the two. This is partly
> technology-based (ie the tex hyphenation patterns are clearly
> separate from any *spell lexicons and implementations), partly
> dictated by the host applications (ie in MS Office, there are
> separate API's and files for spellers and hyphenators, even thouth
> the present (non-hfst-based) Sámi speller and hyphenation lexicons
> are one and the same).

Yeah, I couldn't say anything specific really either.  For Finnish we
have all TeX hyphenators, omorfi's dictionary-based one and omorfi's
rule-based one (and all from the voikko project) and all have their pros
and cons. Still since we use transducer technology the rules and
dictionaries for hyphenation can be kept separate with little size or
performance penalty so deciding either way is just as good, I'd think. 

> I tend to think that it will be a cleaner setup if we separate the
> two in different files, even though this might in some cases
> duplicate information or files. That would also entail that we would
> write a separate (but obviously quite similar) specification for the
> hyphenation file.

Makes sense. 

> Tommi's suggestion seems fine, but it would be good for (backwards)
> compatibility checks that this is specified in the index.xml file.

Certainly, no harm done in having metadata spelled out in the index. 
 
> [Side note: requiring HFST 3 headers presupposes that HFST 3 is
> available on relevant plattforms - any plans for the public release
> of it? There are still build problems on the Mac;) ]

I thought I killed of some bugs a while ago at least with foma linking.
Anyways, when I left for a vacation a while ago the implementation for
detachable and linkable backend libraries seemed nearly finished so I'd
expect they are done by now (haven't checked since the Internet where
I'm at is still quite unreliable to use svn or remote servers). This
way, if foma doesn't work or link on mac, it can be at least disabled
from mac builds. 

Of course for voikko only hfst-optimized-format backend or library is
needed, so in any case I'd think that builds on macs already.


-- 
Flammie, computer scientist bachelor, linguist master, free software
Finnish localiser, and more! <http://www.iki.fi/flammie/>



More information about the Libvoikko mailing list