[libvoikko] [Apertium-stuff] Lttoolbox (Apertium) morphology backend
Francis Tyers
ftyers at prompsit.com
Sun Feb 28 21:40:19 EET 2010
El dg 28 de 02 de 2010 a les 21:18 +0100, en/na Jacob Nordfalk va
escriure:
>
>
> 2010/2/28 Francis Tyers <ftyers at prompsit.com>
> El dg 28 de 02 de 2010 a les 21:40 +0200, en/na Harri Pitkänen
> va
> escriure:
> > On Sunday 28 February 2010, Francis Tyers wrote:
> > > > I don't know Icelandic at all and therefore can't tell
> whether some of
> > > > the words are accepted or rejected incorrectly.
> > >
> > > Nice, it looks good. Some of the capitalised words should
> be recognised
> > > corrected, at least 'Bretlandi' and 'Norðmenn' .
> >
>
> > I tried to fix the checking of capitalized words but started
> to run into
> > problems. It seems that the library API works in somewhat
> surprising (at least
> > to me) ways when you enter a word that starts with a capital
> letter and ends
> > with garbage.
> >
> > The implementation is here
> >
> http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182&view=markup
> >
> > and test cases here
> >
> http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183&view=markup
> >
> > I was able to get all test cases expect the one with TODO in
> method name
> > implemented. How would you suggest fixing the code so that
> all tests would
> > pass? Of course a patch would be most welcome :)
>
> Hmm, strangely enough, when I try an unknown word I get
> similar strange
> output:
>
> $ ./test mor.bin
> ^Reykjanghfghesi$ -->
> ^Reykja<vblex><actv><inf>/Reykja<vblex><actv><pri><p3><pl>/Reykur<n><m><pl><gen><ind>$
>
> It seems that in the 'biltrans' mode, the 'standard' sections
> are
> treated as inconditional. e.g. it just returns the longest
> match in all
> cases.
>
> I will think some more about this.
>
>
> Biltrans must actually work like this.
> I dont understand why you would use biltrans in an analyser.
Because biltrans takes a string, not a FILE*
>
> In biltrans partial match are allowed. The symbols (and letters) after
> the match is called the queue.
> For example, the input symbol house<n><sg>
> Matches in the bidix house<n> -> domo<n> and the queue is <sg>
> The result is domo<n><sg>
Hmm, ok, so probably we need a new method :(
Fran
More information about the Libvoikko
mailing list