[libvoikko] Proposed interface for hyphenator components
Krister Lindén
krister.linden at helsinki.fi
Tue Dec 8 04:43:52 EET 2009
Harri Pitkänen wrote on December 7, 2009, at 20:01:
> On Monday 07 December 2009 02:55:46 Krister Lindén wrote:
>> Other effects on the spelling could also be modeled with this, but the
>> consolidated output assumes that the effects of hyphenation are always
>> local, i.e. there are no discontinuous side-effects further away in the
>> string for hyphenating at one point and that any changes in the input
>> string surrounding the introduced hyphen relate to this hyphen, i.e. the
>> first non-changing character on either side of a hyphen breaks the need
>> to modify the input string.
>
> If I understood correctly, array
>
> q q 0
> w w 1
> e é 1
> a a 0
>
> would result in hyphenations "qw-éa" and "qwé-a". First I thought that it is
> not possible to represent case where the word should be hyphenated as
> {"qw-ea", "qwé-a"} or {"qw-éa", "qwe-a"} meaning that changes in one letter
> syllables cannot be forced to appear only on one side of the hyphenation
> point. But perhaps if we want to hyphenate "qwea" as {"qw-ea", "qwé-a"} we
> could use a zero length insertion like this:
>
> q q 0
> w w 1
> 0
> e é 1
> a a 0
>
> It seems like this format could be enough for our needs.
Yes. The zero length identical insertion is the intended solution. I did
not wish to belabor my previous message with too much detail. The format
will work as long as the changes to the input do not overlap as in:
qwea -> qu-éa or qwi-a
Although I have never heard about that in any language, I guess strange
things could happen at compound boundaries due to differing compound
splits at consecutive hyphenation points. For now, I think the format is
sufficient, but we will need to stay tuned for possible counterexamples.
Krister
More information about the Libvoikko
mailing list