[libvoikko] [Apertium-stuff] Lttoolbox (Apertium) morphology backend
Francis Tyers
ftyers at prompsit.com
Sun Feb 28 22:07:12 EET 2010
El dg 28 de 02 de 2010 a les 21:47 +0100, en/na Jacob Nordfalk va
escriure:
>
>
> 2010/2/28 Francis Tyers <ftyers at prompsit.com>
> El dg 28 de 02 de 2010 a les 21:18 +0100, en/na Jacob Nordfalk
> va
> escriure:
>
> >
> >
> > 2010/2/28 Francis Tyers <ftyers at prompsit.com>
> > El dg 28 de 02 de 2010 a les 21:40 +0200, en/na
> Harri Pitkänen
> > va
> > escriure:
> > > On Sunday 28 February 2010, Francis Tyers wrote:
> > > > > I don't know Icelandic at all and therefore
> can't tell
> > whether some of
> > > > > the words are accepted or rejected
> incorrectly.
> > > >
> > > > Nice, it looks good. Some of the capitalised
> words should
> > be recognised
> > > > corrected, at least 'Bretlandi' and 'Norðmenn' .
> > >
> >
> > > I tried to fix the checking of capitalized words
> but started
> > to run into
> > > problems. It seems that the library API works in
> somewhat
> > surprising (at least
> > > to me) ways when you enter a word that starts with
> a capital
> > letter and ends
> > > with garbage.
> > >
> > > The implementation is here
> > >
> >
> http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182&view=markup
> > >
> > > and test cases here
> > >
> >
> http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183&view=markup
> > >
> > > I was able to get all test cases expect the one
> with TODO in
> > method name
> > > implemented. How would you suggest fixing the code
> so that
> > all tests would
> > > pass? Of course a patch would be most welcome :)
> >
> > Hmm, strangely enough, when I try an unknown word I
> get
> > similar strange
> > output:
> >
> > $ ./test mor.bin
> > ^Reykjanghfghesi$ -->
> >
> ^Reykja<vblex><actv><inf>/Reykja<vblex><actv><pri><p3><pl>/Reykur<n><m><pl><gen><ind>$
> >
> > It seems that in the 'biltrans' mode, the 'standard'
> sections
> > are
> > treated as inconditional. e.g. it just returns the
> longest
> > match in all
> > cases.
> >
> > I will think some more about this.
> >
> >
> > Biltrans must actually work like this.
> > I dont understand why you would use biltrans in an analyser.
>
>
> Because biltrans takes a string, not a FILE*
>
> >
> > In biltrans partial match are allowed. The symbols (and
> letters) after
> > the match is called the queue.
> > For example, the input symbol house<n><sg>
> > Matches in the bidix house<n> -> domo<n> and the queue
> is <sg>
> > The result is domo<n><sg>
>
>
>
> The above is behaviour of biltransWithQueue()
>
Nah, that doesn't do it :/
The new method should probably just be a copy of the old one, only that
checks to see if all the input has been consumed.
Fran
More information about the Libvoikko
mailing list