[libvoikko] Proposed interface for hyphenator components

Krister Lindén krister.linden at helsinki.fi
Tue Dec 8 04:43:52 EET 2009


Harri Pitkänen wrote on December 7, 2009, at 20:01:
> On Monday 07 December 2009 02:55:46 Krister Lindén wrote:
>> Other effects on the spelling could also be modeled with this, but the
>> consolidated output assumes that the effects of hyphenation are always
>> local, i.e. there are no discontinuous side-effects further away in the
>> string for hyphenating at one point and that any changes in the input
>> string surrounding the introduced hyphen relate to this hyphen, i.e. the
>> first non-changing character on either side of a hyphen breaks the need
>> to modify the input string.
> 
> If I understood correctly, array
> 
> q q 0
> w w 1
> e é 1
> a a 0
> 
> would result in hyphenations "qw-éa" and "qwé-a". First I thought that it is 
> not possible to represent case where the word should be hyphenated as 
> {"qw-ea", "qwé-a"} or {"qw-éa", "qwe-a"} meaning that changes in one letter 
> syllables cannot be forced to appear only on one side of the hyphenation 
> point. But perhaps if we want to hyphenate "qwea" as {"qw-ea", "qwé-a"} we 
> could use a zero length insertion like this:
> 
> q q 0
> w w 1
>      0
> e é 1
> a a 0
> 
> It seems like this format could be enough for our needs.

Yes. The zero length identical insertion is the intended solution. I did 
not wish to belabor my previous message with too much detail. The format 
will work as long as the changes to the input do not overlap as in:

  qwea -> qu-éa or qwi-a

Although I have never heard about that in any language, I guess strange 
things could happen at compound boundaries due to differing compound 
splits at consecutive hyphenation points. For now, I think the format is 
sufficient, but we will need to stay tuned for possible counterexamples.

Krister




More information about the Libvoikko mailing list