[libvoikko] [Apertium-stuff] Lttoolbox (Apertium) morphology backend

Francis Tyers ftyers at prompsit.com
Sun Feb 28 21:40:19 EET 2010


El dg 28 de 02 de 2010 a les 21:18 +0100, en/na Jacob Nordfalk va
escriure:
> 
> 
> 2010/2/28 Francis Tyers <ftyers at prompsit.com>
>         El dg 28 de 02 de 2010 a les 21:40 +0200, en/na Harri Pitkänen
>         va
>         escriure:
>         > On Sunday 28 February 2010, Francis Tyers wrote:
>         > > > I don't know Icelandic at all and therefore can't tell
>         whether some of
>         > > > the  words are accepted or rejected incorrectly.
>         > >
>         > > Nice, it looks good. Some of the capitalised words should
>         be recognised
>         > > corrected, at least 'Bretlandi' and 'Norðmenn' .
>         >
>         
>         > I tried to fix the checking of capitalized words but started
>         to run into
>         > problems. It seems that the library API works in somewhat
>         surprising (at least
>         > to me) ways when you enter a word that starts with a capital
>         letter and ends
>         > with garbage.
>         >
>         > The implementation is here
>         >
>         http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182&view=markup
>         >
>         > and test cases here
>         >
>         http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183&view=markup
>         >
>         > I was able to get all test cases expect the one with TODO in
>         method name
>         > implemented. How would you suggest fixing the code so that
>         all tests would
>         > pass? Of course a patch would be most welcome :)
>         
>         Hmm, strangely enough, when I try an unknown word I get
>         similar strange
>         output:
>         
>         $ ./test mor.bin
>         ^Reykjanghfghesi$ -->
>         ^Reykja<vblex><actv><inf>/Reykja<vblex><actv><pri><p3><pl>/Reykur<n><m><pl><gen><ind>$
>         
>         It seems that in the 'biltrans' mode, the 'standard' sections
>         are
>         treated as inconditional. e.g. it just returns the longest
>         match in all
>         cases.
>         
>         I will think some more about this.
> 
> 
> Biltrans must actually work like this. 
> I dont understand why you would use biltrans in an analyser.

Because biltrans takes a string, not a FILE*

> 
> In biltrans partial match are allowed. The symbols (and letters) after
> the match is called the queue.
> For example, the input symbol house<n><sg>  
> Matches in the bidix house<n>   ->  domo<n>    and the queue is <sg>
> The result is domo<n><sg>

Hmm, ok, so probably we need a new method :(

Fran





More information about the Libvoikko mailing list