From hatapitk at iki.fi Mon Feb 8 17:52:47 2010 From: hatapitk at iki.fi (Harri =?iso-8859-1?q?Pitk=E4nen?=) Date: Mon, 8 Feb 2010 17:52:47 +0200 Subject: [libvoikko] Libvoikko 2.3 Message-ID: <201002081752.47717.hatapitk@iki.fi> Libvoikko 2.3 has been released. Release notes and sources are available at http://voikko.sourceforge.net Harri From hatapitk at iki.fi Mon Feb 8 20:20:44 2010 From: hatapitk at iki.fi (Harri =?iso-8859-1?q?Pitk=E4nen?=) Date: Mon, 8 Feb 2010 20:20:44 +0200 Subject: [libvoikko] Schedule for libvoikko 3.0 + applications Message-ID: <201002082020.44278.hatapitk@iki.fi> Libvoikko 3.0 will be released by the end of June this year. It could be released earlier but not later unless something really unexpected happens. The reason for this fixed schedule is that a major limitation in our current design (only one open dictionary at a time) is starting to become a real problem now that people have started to use Voikko in more exotic applications that just OOo, Firefox and Thunderbird. Removing this limitation is going to be the most important goal of our next release. Fixing this requires changes to our API. Now that we are changing the API (for the first time after version 0.4 which was released almost four years ago) a few other issues will be fixed as well. At least hyphenator will be changed, maybe grammar checker too. I intend to make the changes in such way that existing symbol names will not be reused for something different. This will allow the library to remain compatible with the old API and ABI. You should be able to replace older libraries with libvoikko 3.0 without breaking anything or recompiling the applications. Additionally there will be no changes to our dictionary format in this release. Ideally the transition plan to libvoikko 3.0 could look something like this for the Linux distributors: - Upload libvoikko 3.0 as soon as it is released. No additional conflicts or package name changes are needed. - Upload new versions of mozvoikko, openoffice.org-voikko, tmispell and Enchant when they are released. If the versions use the new APIs from libvoikko 3.0 they should depend on the appropriate version of the library, nothing else needs to be changed. - In some future version of libvoikko (maybe 3.1 but it could be later, perhaps never?) the compatibility APIs will be removed. At that point you need to rename the binary packages and -dev packages and change the depending packages to build against the new version. Since the depending packages were already changed to use the new APIs you do not need to modify the actual source, just rebuild. In practice there will be a few changes to the behaviour of the old API in libvoikko 3.0: - Option VOIKKO_INTERSECT_COMPOUND_LEVEL will no longer have any effect. We discussed this in December and nobody supported maintaining this option anymore. I still do not know of any application that needs it. Programs using this option can be built and run but they will behave as if the option was not set. - Option VOIKKO_OPT_ENCODING will no longer have any effect. This option has been deprecated since libvoikko 2.0 and the documentation has never specified what the allowed values for the option are (they were platform dependent). It would have technically been correct to always reject other values than the default (UTF-8). As far as I know, our test programs "voikkospell" and "voikkohyphenate" were the only applications that ever used this option with non-default values. The new API should have some support for languages other than Finnish. This support will be experimental so that we can avoid introducing new dictionary format version in this release. Depending on when the issues with HFST will be resolved we will release with an actual implementation for some language (maybe English) or just the API. Harri From andris.pavenis at iki.fi Wed Feb 10 07:50:37 2010 From: andris.pavenis at iki.fi (Andris Pavenis) Date: Wed, 10 Feb 2010 07:50:37 +0200 Subject: [libvoikko] Libvoikko 2.3 In-Reply-To: <201002081752.47717.hatapitk@iki.fi> References: <201002081752.47717.hatapitk@iki.fi> Message-ID: <4B72492D.3020205@iki.fi> 08.02.2010 17:52, Harri Pitk?nen kirjoitti: > Libvoikko 2.3 has been released. Release notes and sources are available at > http://voikko.sourceforge.net Fails to build on CentOS 5.4 (i386) (it means that perhaps also on corresponding RHEL version) with gcc version 4.1.2 20080704 (Red Hat 4.1.2-46) due to warnings turned into errors by option -Werror. Output of make is in the end of this message. Andris PS. Builds OK under Fedora 12 (x86_64) with: gcc version 3.4.6 20060404 (Red Hat 3.4.6-18) gcc-versio 4.4.2 20091222 (Red Hat 4.4.2-20) (GCC) ======================================================================= Output of Make in CentOS 5.4 i386 [andris at ap ix86]$ make make all-recursive make[1]: Siirryt??n hakemistoon "/home/andris/Build/voikko/libvoikko/2.3/ix86" Making all in src make[2]: Siirryt??n hakemistoon "/home/andris/Build/voikko/libvoikko/2.3/ix86/src" make all-am make[3]: Siirryt??n hakemistoon "/home/andris/Build/voikko/libvoikko/2.3/ix86/src" depbase=`echo spellchecker/spell.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\ /bin/sh ../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I. -I../../libvoikko-2.3/src -I.. -g -O2 -fvisibility=hidden -Wall -Werror -pedantic -MT spellchecker/spell.lo -MD -MP -MF $depbase.Tpo -c -o spellchecker/spell.lo ../../libvoikko-2.3/src/spellchecker/spell.cpp &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: g++ -DHAVE_CONFIG_H -I. -I../../libvoikko-2.3/src -I.. -g -O2 -fvisibility=hidden -Wall -Werror -pedantic -MT spellchecker/spell.lo -MD -MP -MF spellchecker/.deps/spell.Tpo -c ../../libvoikko-2.3/src/spellchecker/spell.cpp -fPIC -DPIC -o spellchecker/.libs/spell.o cc1plus: warnings being treated as errors ../../libvoikko-2.3/src/spellchecker/Speller.hpp:37: warning: ?class libvoikko::spellchecker::Speller? has virtual functions but non-virtual destructor ../../libvoikko-2.3/src/hyphenator/Hyphenator.hpp:28: warning: ?class libvoikko::hyphenator::Hyphenator? has virtual functions but non-virtual destructor make[3]: *** [spellchecker/spell.lo] Virhe 1 make[3]: Poistutaan hakemistosta "/home/andris/Build/voikko/libvoikko/2.3/ix86/src" make[2]: *** [all] Virhe 2 make[2]: Poistutaan hakemistosta "/home/andris/Build/voikko/libvoikko/2.3/ix86/src" make[1]: *** [all-recursive] Virhe 1 make[1]: Poistutaan hakemistosta "/home/andris/Build/voikko/libvoikko/2.3/ix86" make: *** [all] Virhe 2 From hatapitk at iki.fi Wed Feb 10 19:06:54 2010 From: hatapitk at iki.fi (Harri =?windows-1252?q?Pitk=E4nen?=) Date: Wed, 10 Feb 2010 19:06:54 +0200 Subject: [libvoikko] Libvoikko 2.3 In-Reply-To: <4B72492D.3020205@iki.fi> References: <201002081752.47717.hatapitk@iki.fi> <4B72492D.3020205@iki.fi> Message-ID: <201002101906.54727.hatapitk@iki.fi> On Wednesday 10 February 2010, Andris Pavenis wrote: > Fails to build on CentOS 5.4 (i386) (it means that perhaps also on > corresponding RHEL version) with gcc version 4.1.2 20080704 (Red Hat > 4.1.2-46) Fixed, thanks. The fix will be in the next release. You don't have to wait very long because the Enchant problem reported on [voikko] is going to require a release of 2.3.1 next week. Harri From hatapitk at iki.fi Sat Feb 13 18:40:22 2010 From: hatapitk at iki.fi (Harri =?iso-8859-1?q?Pitk=E4nen?=) Date: Sat, 13 Feb 2010 18:40:22 +0200 Subject: [libvoikko] libvoikko 2.3.1rc1 Message-ID: <201002131840.23058.hatapitk@iki.fi> Libvoikko 2.3.1rc1 is available for testing at http://www.puimula.org/htp/testing/libvoikko-2.3.1rc1.tar.gz The following fixes and improvements have been made since version 2.3: 2010-02-13 Harri Pitk?nen * Increase version numbers for 2.3.1. 2010-02-11 Andris Pavenis * Updated MSVC project file. 2010-02-11 Harri Pitk?nen * Replace uses of towlower with locale independent implementation. 2010-02-10 Harri Pitk?nen * Add virtual destructors for Speller and Hyphenator. This release candidate will be released as 2.3.1 on Wednesday 2010-02-17 unless problems are found during testing. Harri From hatapitk at iki.fi Sat Feb 13 19:14:43 2010 From: hatapitk at iki.fi (Harri =?iso-8859-1?q?Pitk=E4nen?=) Date: Sat, 13 Feb 2010 19:14:43 +0200 Subject: [libvoikko] mozvoikko 1.0.1rc1 Message-ID: <201002131914.43868.hatapitk@iki.fi> New release candidate of mozvoikko (spell checking extension for Mozilla products like Firefox) by Andris Pavenis is available for testing at http://www.puimula.org/htp/testing/mozvoikko-1.0.1rc1.tar.gz More information (in Finnish) is available at http://ap1.pp.fi/mozilla/mozilla+voikko.html Binary extension packages for Windows, Linux (32 and 64 bit) and OS X (Intel) are also available. These come with libvoikko and Suomi-malaga and should be usable for checking Finnish right after installation: http://ap1.pp.fi/mozilla/mozvoikko/1.0.1/ This release requires libvoikko 2.3. (Older versions will work if you do not need to ship libvoikko as a part of the extension, which is the case for Linux distributions). Compared to mozvoikko 1.0 this release contains significant compatibility improvements: - Works with Firefox 3.6 - Works with OS X on PPC (Darwin_ppc-gcc3) - Works with GNU/kFreeBSD This release candidate will be released as 1.0.1 on Wednesday 2010-02-17 unless problems are found during testing. Harri From hatapitk at iki.fi Wed Feb 17 19:41:11 2010 From: hatapitk at iki.fi (Harri =?iso-8859-1?q?Pitk=E4nen?=) Date: Wed, 17 Feb 2010 19:41:11 +0200 Subject: [libvoikko] libvoikko 2.3.1 and mozvoikko 1.0.1 released Message-ID: <201002171941.11972.hatapitk@iki.fi> Libvoikko 2.3.1 and mozvoikko 1.0.1 have been released. Sources and release notes are available from our SourceForge site. Harri From ftyers at prompsit.com Tue Feb 23 23:57:47 2010 From: ftyers at prompsit.com (Francis Tyers) Date: Tue, 23 Feb 2010 22:57:47 +0100 Subject: [libvoikko] [Apertium-stuff] Apertium with Berber languages - multiple affixes (pre- and postfixes) In-Reply-To: References: <02b058bf4759cf0b24fdf2a7825c9704@www2-mail.volny.cz> <4B842D5C.4060606@dlsi.ua.es> <1266957941.2929.12397.camel@eki.dlsi.ua.es> <1266960825.2929.12514.camel@eki.dlsi.ua.es> <8ed208e3d623b071b1b06dfaa93e97ed@www1-mail.volny.cz> <1266961712.2929.12550.camel@eki.dlsi.ua.es> Message-ID: <1266962267.2929.12582.camel@eki.dlsi.ua.es> El dt 23 de 02 de 2010 a les 22:53 +0000, en/na Jimmy O'Regan va escriure: > On 23 February 2010 21:48, Francis Tyers wrote: > > El dt 23 de 02 de 2010 a les 23:44 +0100, en/na Paul Anderson va > > escriure: > >> Oh no.. I was planning Hunspell spell checkers too! > > > > If you are interested in spell checking you should check out HFST and > > libvoikko. The libvoikko mailing list is at: > > > > The reality is that few people are interested in spell checking for > its own sake; hunspell is what OpenOffice uses, that's why people > target it. libvoikko works with OpenOffice, and Firefox. Fran From hatapitk at iki.fi Wed Feb 24 20:55:06 2010 From: hatapitk at iki.fi (Harri =?iso-8859-1?q?Pitk=E4nen?=) Date: Wed, 24 Feb 2010 20:55:06 +0200 Subject: [libvoikko] Wiki pages related to libvoikko Message-ID: <201002242055.06904.hatapitk@iki.fi> A few days ago I enabled SourceForge Trac feature for Voikko. Currently only Trac wiki is used but soon we will start using the ticket feature too. It will replace SourceForge tracker for bug reporting. I wrote two pages related to libvoikko. One containing my plans for the next releases https://sourceforge.net/apps/trac/voikko/wiki/libvoikko/RoadMap and another listing issues I have found while investigating some of the morphologies that have been proposed for use in libvoikko https://sourceforge.net/apps/trac/voikko/wiki/libvoikko/SupportedLanguages There should be nothing new on these pages that has not yet been mentioned on this list or elsewhere. Maybe it is still useful to have this information in one place. Perhaps the only "new" thing is moving HFST integration to libvoikko 3.1. This is not a final decision. I just believe that given the current release schedule of libvoikko and the large changes being made in HFST it is better for me to concentrate on fixing issues that affect Finnish users the most. Patches for HFST related changes are still accepted and will be merged quickly if someone is interested in having the integration happen sooner. Harri From ftyers at prompsit.com Wed Feb 24 20:12:49 2010 From: ftyers at prompsit.com (Francis Tyers) Date: Wed, 24 Feb 2010 19:12:49 +0100 Subject: [libvoikko] Wiki pages related to libvoikko In-Reply-To: <201002242055.06904.hatapitk@iki.fi> References: <201002242055.06904.hatapitk@iki.fi> Message-ID: <1267035169.2929.15037.camel@eki.dlsi.ua.es> El dc 24 de 02 de 2010 a les 20:55 +0200, en/na Harri Pitk?nen va escriure: > A few days ago I enabled SourceForge Trac feature for Voikko. Currently only > Trac wiki is used but soon we will start using the ticket feature too. It will > replace SourceForge tracker for bug reporting. > > I wrote two pages related to libvoikko. One containing my plans for the next > releases > > https://sourceforge.net/apps/trac/voikko/wiki/libvoikko/RoadMap > > and another listing issues I have found while investigating some of the > morphologies that have been proposed for use in libvoikko > > https://sourceforge.net/apps/trac/voikko/wiki/libvoikko/SupportedLanguages There are more HFST based morphologies in the Giellatekno SVN: https://victorio.uit.no/langtech/trunk/st At least the 'fao' and 'kal' would be usable I think -- although 'kal' is with Foma, not HFST 'proper'. Fran From hatapitk at iki.fi Wed Feb 24 23:21:47 2010 From: hatapitk at iki.fi (Harri =?utf-8?q?Pitk=C3=A4nen?=) Date: Wed, 24 Feb 2010 23:21:47 +0200 Subject: [libvoikko] Wiki pages related to libvoikko In-Reply-To: <1267035169.2929.15037.camel@eki.dlsi.ua.es> References: <201002242055.06904.hatapitk@iki.fi> <1267035169.2929.15037.camel@eki.dlsi.ua.es> Message-ID: <201002242321.47994.hatapitk@iki.fi> On Wednesday 24 February 2010 20:12:49 Francis Tyers wrote: > There are more HFST based morphologies in the Giellatekno SVN: > > https://victorio.uit.no/langtech/trunk/st > > At least the 'fao' and 'kal' would be usable I think -- although 'kal' > is with Foma, not HFST 'proper'. There appears to be licensing issues with both of them. Fao does not seem to have any copyright information in it, kal has copyright notice in one file (src/twol-kal.txt) and that seems to indicate the file being proprietary source from Lingsoft, not GPL. Initial quality of implementation is not so important to me, but licensing must be clear from the beginning so that I know wee can safely distribute the material under a free license. Hopefully these issues can be resolved easily by providing a list of copyright holders and an explicit license text along with the source files. Although I have not tested it yet, Foma seems to be OK as a backend library as its license is compatible with libvoikko even if it would not be compatible with the rest of HFST. Harri From ftyers at prompsit.com Wed Feb 24 22:29:06 2010 From: ftyers at prompsit.com (Francis Tyers) Date: Wed, 24 Feb 2010 21:29:06 +0100 Subject: [libvoikko] Wiki pages related to libvoikko In-Reply-To: <201002242321.47994.hatapitk@iki.fi> References: <201002242055.06904.hatapitk@iki.fi> <1267035169.2929.15037.camel@eki.dlsi.ua.es> <201002242321.47994.hatapitk@iki.fi> Message-ID: <1267043346.2929.15388.camel@eki.dlsi.ua.es> El dc 24 de 02 de 2010 a les 23:21 +0200, en/na Harri Pitk?nen va escriure: > On Wednesday 24 February 2010 20:12:49 Francis Tyers wrote: > > There are more HFST based morphologies in the Giellatekno SVN: > > > > https://victorio.uit.no/langtech/trunk/st > > > > At least the 'fao' and 'kal' would be usable I think -- although 'kal' > > is with Foma, not HFST 'proper'. > > There appears to be licensing issues with both of them. Fao does not seem to > have any copyright information in it, kal has copyright notice in one file > (src/twol-kal.txt) and that seems to indicate the file being proprietary > source from Lingsoft, not GPL. The FAO one the lemma list is not free (yet), but the twol rules are. The KAL one is as far as I know GPL -- although the directories should contain licences you're right. > Initial quality of implementation is not so important to me, but licensing > must be clear from the beginning so that I know wee can safely distribute the > material under a free license. Hopefully these issues can be resolved easily > by providing a list of copyright holders and an explicit license text along > with the source files. Agree. > Although I have not tested it yet, Foma seems to be OK as a backend library as > its license is compatible with libvoikko even if it would not be compatible > with the rest of HFST. How hard would it be to add another backend library to libvoikko ? We have quite a few analysers in the Apertium project (with explicit GPL licensing -- mostly v2 or later), it would be interesting to be able to include them. Here is a code snippet that calls the library for a given (platform-independent compiled binary): http://wiki.apertium.org/wiki/Lttoolbox#Using_as_a_library Fran From hatapitk at iki.fi Wed Feb 24 23:54:28 2010 From: hatapitk at iki.fi (Harri =?utf-8?q?Pitk=C3=A4nen?=) Date: Wed, 24 Feb 2010 23:54:28 +0200 Subject: [libvoikko] Wiki pages related to libvoikko In-Reply-To: <1267043346.2929.15388.camel@eki.dlsi.ua.es> References: <201002242055.06904.hatapitk@iki.fi> <201002242321.47994.hatapitk@iki.fi> <1267043346.2929.15388.camel@eki.dlsi.ua.es> Message-ID: <201002242354.29198.hatapitk@iki.fi> On Wednesday 24 February 2010 22:29:06 Francis Tyers wrote: > How hard would it be to add another backend library to libvoikko ? We > have quite a few analysers in the Apertium project (with explicit GPL > licensing -- mostly v2 or later), it would be interesting to be able to > include them. Here is a code snippet that calls the library for a given > (platform-independent compiled binary): For simple needs (spell checking without character case checks and no spelling suggestions, language alphabet is completely within Latin Extended-A or lower Unicode ranges), it is quite easy, only a few hours of work. Anything more than that requires more work. See src/morphology/HfstAnalyzer.cpp for an example. If I remember correctly, Apertium is already in Debian. I could take a look next week and see what can be done. Harri From ftyers at prompsit.com Wed Feb 24 22:57:59 2010 From: ftyers at prompsit.com (Francis Tyers) Date: Wed, 24 Feb 2010 21:57:59 +0100 Subject: [libvoikko] Wiki pages related to libvoikko In-Reply-To: <201002242354.29198.hatapitk@iki.fi> References: <201002242055.06904.hatapitk@iki.fi> <201002242321.47994.hatapitk@iki.fi> <1267043346.2929.15388.camel@eki.dlsi.ua.es> <201002242354.29198.hatapitk@iki.fi> Message-ID: <1267045079.2929.15477.camel@eki.dlsi.ua.es> El dc 24 de 02 de 2010 a les 23:54 +0200, en/na Harri Pitk?nen va escriure: > On Wednesday 24 February 2010 22:29:06 Francis Tyers wrote: > > How hard would it be to add another backend library to libvoikko ? We > > have quite a few analysers in the Apertium project (with explicit GPL > > licensing -- mostly v2 or later), it would be interesting to be able to > > include them. Here is a code snippet that calls the library for a given > > (platform-independent compiled binary): > > For simple needs (spell checking without character case checks and no spelling > suggestions, language alphabet is completely within Latin Extended-A or lower > Unicode ranges), it is quite easy, only a few hours of work. Anything more > than that requires more work. See src/morphology/HfstAnalyzer.cpp for an > example. > > If I remember correctly, Apertium is already in Debian. I could take a look > next week and see what can be done. Yep, the package you need is 'lttoolbox' (and the dev package), I'm the maintainer so you can ask any questions to me (or to the apertium-stuff list). If you don't get time next week I'll give it a shot... Fran From flammie at iki.fi Thu Feb 25 02:47:53 2010 From: flammie at iki.fi (Flammie Pirinen) Date: Thu, 25 Feb 2010 02:47:53 +0200 Subject: [libvoikko] Wiki pages related to libvoikko In-Reply-To: <201002242055.06904.hatapitk@iki.fi> References: <201002242055.06904.hatapitk@iki.fi> Message-ID: <20100225024753.51826012@cockatrice> 2010-02-24, Harri Pitk?nen sanoi: > Perhaps the only "new" thing is moving > HFST integration to libvoikko 3.1. This is not a final decision. I > just believe that given the current release schedule of libvoikko and > the large changes being made in HFST it is better for me to > concentrate on fixing issues that affect Finnish users the most. > Patches for HFST related changes are still accepted and will be > merged quickly if someone is interested in having the integration > happen sooner. The slower integration plan is fine by me, as I wouldn't really mind having the actual optimized library usable for voikko instead of the heavyweight library with my leaking code. ...however for purpose of testing and integrating the international voikko I've attached a mostly untested patch, pieced together from throwaway code I used for experimentation of hyphenation and suggestion algorithms (it's basically copy pasted from this batch testing command line tool , it's the same tool that produces my comparison tables in omorfi site). I assume you may find it usable even with its memory leaks and performance issues. It includes a suggestion of FST implementation of suggestion mechanism I am currently writing a paper or tech doc about. As it seems fast enough to be usable on my acer aspire one I believe it is one correct way to implement extendable language specific modules for the suggestion algorithm (it's also easily convertable from hunspell confusion tables). It is most likely not usable as is, but it compiles and I hope it gives you a clue of its purpose. Also, the spelling part seemingly works, at least strace tells me it loads the transducers again. -- Flammie, computer scientist bachelor, linguist master, free software Finnish localiser, and more! -------------- next part -------------- A non-text attachment was scrubbed... Name: voikko-hfst-flag-diacritics.patch Type: text/x-patch Size: 32879 bytes Desc: not available URL: From hatapitk at iki.fi Thu Feb 25 20:22:16 2010 From: hatapitk at iki.fi (Harri =?iso-8859-15?q?Pitk=E4nen?=) Date: Thu, 25 Feb 2010 20:22:16 +0200 Subject: [libvoikko] Wiki pages related to libvoikko In-Reply-To: <20100225024753.51826012@cockatrice> References: <201002242055.06904.hatapitk@iki.fi> <20100225024753.51826012@cockatrice> Message-ID: <201002252022.16880.hatapitk@iki.fi> On Thursday 25 February 2010, Flammie Pirinen wrote: > The slower integration plan is fine by me, as I wouldn't really mind > having the actual optimized library usable for voikko instead of the > heavyweight library with my leaking code. > > ...however for purpose of testing and integrating the international > voikko I've attached a mostly untested patch Thanks. I fixed the warnings about unused parameters in SVN so that you do not have to maintain the diff of those changes anymore. > I assume you may find it usable even with its memory leaks and > performance issues. It includes a suggestion of FST implementation of > suggestion mechanism I am currently writing a paper or tech doc about. Looks interesting. We should improve the suggestion generation code so that your suggestion mechanism could actually be plugged into use. Should not be too hard, most of the required abstractions are already there. I won't merge the patch in its current form as the FlagDiacritics part appears to be something that should rather belong to HFST. Additionally indentation does not match the tab indentation used in libvoikko. I admit that the choice of using tabs for indentation in libvoikko was a mistake and I'll probably reindent all of the source code using four spaces instead. Other than that, the patch looks good to me. Harri From flammie at iki.fi Thu Feb 25 22:33:42 2010 From: flammie at iki.fi (Flammie Pirinen) Date: Thu, 25 Feb 2010 22:33:42 +0200 Subject: [libvoikko] Wiki pages related to libvoikko In-Reply-To: <201002252022.16880.hatapitk@iki.fi> References: <201002242055.06904.hatapitk@iki.fi> <20100225024753.51826012@cockatrice> <201002252022.16880.hatapitk@iki.fi> Message-ID: <20100225223342.0ad2fd6d@cockatrice> 2010-02-25, Harri Pitk?nen sanoi: > On Thursday 25 February 2010, Flammie Pirinen wrote: > > I assume you may find it usable even with its memory leaks and > > performance issues. It includes a suggestion of FST implementation > > of suggestion mechanism I am currently writing a paper or tech doc > > about. > > Looks interesting. We should improve the suggestion generation code > so that your suggestion mechanism could actually be plugged into use. > Should not be too hard, most of the required abstractions are already > there. That would be ideal, or even further, as I think there were provision for different suggestion algorithms in current voikko, same can be easily done providing different transducers. I suppose at some point the hard coded file names can be replaced by information read from the .pro file or something? > I won't merge the patch in its current form as the FlagDiacritics > part appears to be something that should rather belong to HFST. I agree, I'll probably toss it to some corner of the library for now. Hopefully by the time of next major version of the HFST it will be mainly moved to the legacy tools and optimized layer. > Additionally indentation does not match the tab indentation used in > libvoikko. I admit that the choice of using tabs for indentation in > libvoikko was a mistake and I'll probably reindent all of the source > code using four spaces instead. Well, I've always been supporter of tabs for line initial indentation, but overwhelming majority of projects aren't so I finally set up my vim to do GNU style indentation. If you prefer some style that is not GNU nor kernel style, but have e.g. vim cindent modeline for the preferred style, it would greatly help in delivering properly indented patches. -- Flammie, computer scientist bachelor, linguist master, free software Finnish localiser, and more! From hatapitk at iki.fi Sat Feb 27 19:58:36 2010 From: hatapitk at iki.fi (Harri =?utf-8?q?Pitk=C3=A4nen?=) Date: Sat, 27 Feb 2010 19:58:36 +0200 Subject: [libvoikko] SuggestionGenerator interface In-Reply-To: <20100225223342.0ad2fd6d@cockatrice> References: <201002242055.06904.hatapitk@iki.fi> <201002252022.16880.hatapitk@iki.fi> <20100225223342.0ad2fd6d@cockatrice> Message-ID: <201002271958.36429.hatapitk@iki.fi> On Thursday 25 February 2010, Flammie Pirinen wrote: > 2010-02-25, Harri Pitk?nen sanoi: > > Looks interesting. We should improve the suggestion generation code > > so that your suggestion mechanism could actually be plugged into use. > > Should not be too hard, most of the required abstractions are already > > there. > > That would be ideal, or even further, as I think there were provision > for different suggestion algorithms in current voikko, same can be > easily done providing different transducers. I suppose at some point > the hard coded file names can be replaced by information read from > the .pro file or something? Yes, that is possible. I refactored the spelling suggestion code in SVN a bit. If you change your HfstSuggestion class to extend libvoikko::spellchecker::suggestion::SuggestionGenerator and modify libvoikko::spellchecker::suggestion::SuggestionGeneratorFactory appropriately it should now be possible to actually use the suggestions generated by HfstSuggestion. I have not yet implemented the configuration of suggestion generator backend in the .pro file. It should be configured just the same way as Speller and Analyzer are configured, but that does not work yet. The SuggestionGenerator interface is not very well designed. Both inputs (incorrectly spelled word) and outputs (suggestions) are passed using single mediating SuggestionStatus object. For historical reasons that class contains some implementation specific stuff about computational cost. Methods const wchar_t * getWord(); size_t getWordLength(); void addSuggestion(const wchar_t * newSuggestion, int priority); are the only ones you likely will need from it when implementing HfstSuggestion. I'll probably change the interface at some point so that it will just take a word as input and return a list of suggestions as output. > > Additionally indentation does not match the tab indentation used in > > libvoikko. I admit that the choice of using tabs for indentation in > > libvoikko was a mistake and I'll probably reindent all of the source > > code using four spaces instead. > > Well, I've always been supporter of tabs for line initial indentation, > but overwhelming majority of projects aren't so I finally set up my vim > to do GNU style indentation. If you prefer some style that is not GNU > nor kernel style, but have e.g. vim cindent modeline for the preferred > style, it would greatly help in delivering properly indented patches. Unfortunately I do not have modelines available for any editor, I tend to regularly switch between different editors. Basically my coding style has followed Java coding conventions using tabs for indentation and tab width of 4 characters. Harri From hatapitk at iki.fi Sat Feb 27 23:55:13 2010 From: hatapitk at iki.fi (Harri =?iso-8859-1?q?Pitk=E4nen?=) Date: Sat, 27 Feb 2010 23:55:13 +0200 Subject: [libvoikko] Lttoolbox (Apertium) morphology backend In-Reply-To: <201002242354.29198.hatapitk@iki.fi> References: <201002242055.06904.hatapitk@iki.fi> <1267043346.2929.15388.camel@eki.dlsi.ua.es> <201002242354.29198.hatapitk@iki.fi> Message-ID: <201002272355.14338.hatapitk@iki.fi> On Wednesday 24 February 2010, Harri Pitk?nen wrote: > If I remember correctly, Apertium is already in Debian. I could take a > look next week and see what can be done. Well, I didn't wait until next week but implemented the backend already. You can try this by using the sources in SVN: - On Debian unstable install packages lttoolbox, liblttoolbox3-3.1-0, liblttoolbox3-3.1-0-dev - Check out SVN sources - ./autogen.sh - ./configure --prefix=/some/dir --enable-lttoolbox - make install - Copy stuff from http://www.puimula.org/htp/testing/apertium/ to ~/.voikko/2/mor-apertium/ Now you should have a dictionary variant "apertium" available. The test dictionary was copied from http://wiki.apertium.org/wiki/Lttoolbox#Using_as_a_library and it recognizes words "car" and "cars". Event the spelling suggestions work: $ /some/dir/bin/voikkospell -d apertium -s car C: car cara W: cara S: cars S: car I must say I was quite impressed on how easy this was. No compilation problems, no license hassles (Apertium uses exactly the same license as libvoikko), easy to use API. API documentation was a bit hard to find though. I also tried to build some of the real word dictionaries but could not figure out which one would work and how it should be used. Harri From ftyers at prompsit.com Sat Feb 27 23:01:15 2010 From: ftyers at prompsit.com (Francis Tyers) Date: Sat, 27 Feb 2010 22:01:15 +0100 Subject: [libvoikko] Lttoolbox (Apertium) morphology backend In-Reply-To: <201002272355.14338.hatapitk@iki.fi> References: <201002242055.06904.hatapitk@iki.fi> <1267043346.2929.15388.camel@eki.dlsi.ua.es> <201002242354.29198.hatapitk@iki.fi> <201002272355.14338.hatapitk@iki.fi> Message-ID: <1267304475.2929.23696.camel@eki.dlsi.ua.es> El ds 27 de 02 de 2010 a les 23:55 +0200, en/na Harri Pitk?nen va escriure: > On Wednesday 24 February 2010, Harri Pitk?nen wrote: > > If I remember correctly, Apertium is already in Debian. I could take a > > look next week and see what can be done. > > Well, I didn't wait until next week but implemented the backend already. You > can try this by using the sources in SVN: Wow great!! > - On Debian unstable install packages lttoolbox, liblttoolbox3-3.1-0, > liblttoolbox3-3.1-0-dev > - Check out SVN sources > - ./autogen.sh > - ./configure --prefix=/some/dir --enable-lttoolbox > - make install > - Copy stuff from http://www.puimula.org/htp/testing/apertium/ to > ~/.voikko/2/mor-apertium/ > > Now you should have a dictionary variant "apertium" available. The test > dictionary was copied from > http://wiki.apertium.org/wiki/Lttoolbox#Using_as_a_library > and it recognizes words "car" and "cars". Event the spelling suggestions work: > > $ /some/dir/bin/voikkospell -d apertium -s > car > C: car > cara > W: cara > S: cars > S: car > > > I must say I was quite impressed on how easy this was. No compilation > problems, no license hassles (Apertium uses exactly the same license as > libvoikko), easy to use API. API documentation was a bit hard to find though. :) > I also tried to build some of the real word dictionaries but could not figure > out which one would work and how it should be used. You can grab the dictionary from: http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-br-fr/apertium-br-fr.br-fr.dix and https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-is-en/apertium-is-en.is.dix For Breton and Icelandic. The .bin files are compiled by: $ lt-comp lr e.g. lt-comp lr apertium-br-fr.br-fr.dix br.bin As far as we know these are platform independent compiled binaries. I've tried them on several architectures and they "just work", not 100% sure about Windows though. Fran PS. (to Apertium folk) Now we have this, we should perhaps think about marking standard / non-standard forms some way, or being able to compile dictionaries as spellers (e.g. without and multiword entries). From p.ixiemotion at gmail.com Sun Feb 28 01:14:51 2010 From: p.ixiemotion at gmail.com (Kevin Brubeck Unhammer) Date: Sun, 28 Feb 2010 00:14:51 +0100 Subject: [libvoikko] [Apertium-stuff] Lttoolbox (Apertium) morphology backend In-Reply-To: <1267304475.2929.23696.camel@eki.dlsi.ua.es> References: <201002242055.06904.hatapitk@iki.fi> <1267043346.2929.15388.camel@eki.dlsi.ua.es> <201002242354.29198.hatapitk@iki.fi> <201002272355.14338.hatapitk@iki.fi> <1267304475.2929.23696.camel@eki.dlsi.ua.es> Message-ID: <96023f221002271514y2affdf07p1e5eaccc86a249e9@mail.gmail.com> 2010/2/27 Francis Tyers : > El ds 27 de 02 de 2010 a les 23:55 +0200, en/na Harri Pitk?nen va > escriure: >> On Wednesday 24 February 2010, Harri Pitk?nen wrote: >> > If I remember correctly, Apertium is already in Debian. I could take a >> > ?look ?next week and see what can be done. >> >> Well, I didn't wait until next week but implemented the backend already. You >> can try this by using the sources in SVN: > > Wow great!! > >> - On Debian unstable install packages lttoolbox, liblttoolbox3-3.1-0, >> liblttoolbox3-3.1-0-dev >> - Check out SVN sources >> - ./autogen.sh >> - ./configure --prefix=/some/dir --enable-lttoolbox >> - make install >> - Copy stuff from http://www.puimula.org/htp/testing/apertium/ to >> ? ~/.voikko/2/mor-apertium/ >> >> Now you should have a dictionary variant "apertium" available. The test >> dictionary was copied from >> http://wiki.apertium.org/wiki/Lttoolbox#Using_as_a_library >> and it recognizes words "car" and "cars". Event the spelling suggestions work: >> >> $ /some/dir/bin/voikkospell -d apertium -s >> car >> C: car >> cara >> W: cara >> S: cars >> S: car >> >> >> I must say I was quite impressed on how easy this was. No compilation >> problems, no license hassles (Apertium uses exactly the same license as >> libvoikko), easy to use API. API documentation was a bit hard to find though. > > :) > >> I also tried to build some of the real word dictionaries but could not figure >> out which one would work and how it should be used. > > You can grab the dictionary from: > > http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-br-fr/apertium-br-fr.br-fr.dix I think this should be http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-br-fr/apertium-br-fr.br.dix for Breton (and not of the translational dictionary) ;-) -- Kevin Brubeck Unhammer From hatapitk at iki.fi Sun Feb 28 11:00:44 2010 From: hatapitk at iki.fi (Harri =?iso-8859-1?q?Pitk=E4nen?=) Date: Sun, 28 Feb 2010 11:00:44 +0200 Subject: [libvoikko] [Apertium-stuff] Lttoolbox (Apertium) morphology backend In-Reply-To: <96023f221002271514y2affdf07p1e5eaccc86a249e9@mail.gmail.com> References: <201002242055.06904.hatapitk@iki.fi> <1267304475.2929.23696.camel@eki.dlsi.ua.es> <96023f221002271514y2affdf07p1e5eaccc86a249e9@mail.gmail.com> Message-ID: <201002281100.45122.hatapitk@iki.fi> On Sunday 28 February 2010, Kevin Brubeck Unhammer wrote: > I think this should be > http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-br-fr/a > pertium-br-fr.br.dix for Breton (and not of the translational dictionary) > ;-) There appears to be some differences between these dictionaries. This Breton dictionary required one line to be removed from its source before it compiled, but it still did not work at all with Voikko. Probably analysis results for unknown words were represented differently than in the example dictionary. One English dictionary I tested worked partially. Voikko accepted any word that started with a valid English word but ignored any garbage at the end. However the Icelandic dictionary seems to work quite well. Here is a screenshot from OpenOffice.org showing the first few paragraphs from Landn?mab?k (http://www.snerpa.is/net/snorri/landnama.htm) http://www.puimula.org/htp/tmp/ooo-is.png I don't know Icelandic at all and therefore can't tell whether some of the words are accepted or rejected incorrectly. - Checking initial capitalization does not work but that was expected since I did not implement such checking yet. - Perhaps there are some old words that are no longer in use. - Text language has been configured to Finnish which causes OOo to consider some Icelandic letters as word separators. This leads to some words being only partially underlined. Assuming that breakiterator configuration for Icelandic is correct in OOo, this should not happen after we allow openoffice.org-voikko to act as a spell checker for Icelandic text. Harri From ftyers at prompsit.com Sun Feb 28 11:37:17 2010 From: ftyers at prompsit.com (Francis Tyers) Date: Sun, 28 Feb 2010 10:37:17 +0100 Subject: [libvoikko] [Apertium-stuff] Lttoolbox (Apertium) morphology backend In-Reply-To: <201002281100.45122.hatapitk@iki.fi> References: <201002242055.06904.hatapitk@iki.fi> <1267304475.2929.23696.camel@eki.dlsi.ua.es> <96023f221002271514y2affdf07p1e5eaccc86a249e9@mail.gmail.com> <201002281100.45122.hatapitk@iki.fi> Message-ID: <1267349837.2929.25212.camel@eki.dlsi.ua.es> El dg 28 de 02 de 2010 a les 11:00 +0200, en/na Harri Pitk?nen va escriure: > On Sunday 28 February 2010, Kevin Brubeck Unhammer wrote: > > I think this should be > > http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-br-fr/a > > pertium-br-fr.br.dix for Breton (and not of the translational dictionary) > > ;-) > > There appears to be some differences between these dictionaries. This Breton > dictionary required one line to be removed from its source before it compiled, > but it still did not work at all with Voikko. Probably analysis results for > unknown words were represented differently than in the example dictionary. Ah yes, there is an error in it in SVN, I need to fix that. > One English dictionary I tested worked partially. Voikko accepted any word > that started with a valid English word but ignored any garbage at the end. > > However the Icelandic dictionary seems to work quite well. Here is a > screenshot from OpenOffice.org showing the first few paragraphs from > Landn?mab?k (http://www.snerpa.is/net/snorri/landnama.htm) > > http://www.puimula.org/htp/tmp/ooo-is.png > > I don't know Icelandic at all and therefore can't tell whether some of the > words are accepted or rejected incorrectly. Nice, it looks good. Some of the capitalised words should be recognised corrected, at least 'Bretlandi' and 'Nor?menn' . > - Checking initial capitalization does not work but that was expected since I > did not implement such checking yet. > - Perhaps there are some old words that are no longer in use. > - Text language has been configured to Finnish which causes OOo to consider > some Icelandic letters as word separators. This leads to some words being only > partially underlined. Assuming that breakiterator configuration for Icelandic > is correct in OOo, this should not happen after we allow openoffice.org-voikko > to act as a spell checker for Icelandic text. This is really cool, thanks :) Fran From hatapitk at iki.fi Sun Feb 28 21:40:18 2010 From: hatapitk at iki.fi (Harri =?utf-8?q?Pitk=C3=A4nen?=) Date: Sun, 28 Feb 2010 21:40:18 +0200 Subject: [libvoikko] Lttoolbox (Apertium) morphology backend In-Reply-To: <1267349837.2929.25212.camel@eki.dlsi.ua.es> References: <201002242055.06904.hatapitk@iki.fi> <201002281100.45122.hatapitk@iki.fi> <1267349837.2929.25212.camel@eki.dlsi.ua.es> Message-ID: <201002282140.18269.hatapitk@iki.fi> On Sunday 28 February 2010, Francis Tyers wrote: > > I don't know Icelandic at all and therefore can't tell whether some of > > the words are accepted or rejected incorrectly. > > Nice, it looks good. Some of the capitalised words should be recognised > corrected, at least 'Bretlandi' and 'Nor?menn' . I tried to fix the checking of capitalized words but started to run into problems. It seems that the library API works in somewhat surprising (at least to me) ways when you enter a word that starts with a capital letter and ends with garbage. The implementation is here http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182&view=markup and test cases here http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183&view=markup I was able to get all test cases expect the one with TODO in method name implemented. How would you suggest fixing the code so that all tests would pass? Of course a patch would be most welcome :) Harri From ftyers at prompsit.com Sun Feb 28 21:04:27 2010 From: ftyers at prompsit.com (Francis Tyers) Date: Sun, 28 Feb 2010 20:04:27 +0100 Subject: [libvoikko] Lttoolbox (Apertium) morphology backend In-Reply-To: <201002282140.18269.hatapitk@iki.fi> References: <201002242055.06904.hatapitk@iki.fi> <201002281100.45122.hatapitk@iki.fi> <1267349837.2929.25212.camel@eki.dlsi.ua.es> <201002282140.18269.hatapitk@iki.fi> Message-ID: <1267383867.2929.26408.camel@eki.dlsi.ua.es> El dg 28 de 02 de 2010 a les 21:40 +0200, en/na Harri Pitk?nen va escriure: > On Sunday 28 February 2010, Francis Tyers wrote: > > > I don't know Icelandic at all and therefore can't tell whether some of > > > the words are accepted or rejected incorrectly. > > > > Nice, it looks good. Some of the capitalised words should be recognised > > corrected, at least 'Bretlandi' and 'Nor?menn' . > > I tried to fix the checking of capitalized words but started to run into > problems. It seems that the library API works in somewhat surprising (at least > to me) ways when you enter a word that starts with a capital letter and ends > with garbage. > > The implementation is here > http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182&view=markup > > and test cases here > http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183&view=markup > > I was able to get all test cases expect the one with TODO in method name > implemented. How would you suggest fixing the code so that all tests would > pass? Of course a patch would be most welcome :) Hmm, strangely enough, when I try an unknown word I get similar strange output: $ ./test mor.bin ^Reykjanghfghesi$ --> ^Reykja/Reykja/Reykur$ It seems that in the 'biltrans' mode, the 'standard' sections are treated as inconditional. e.g. it just returns the longest match in all cases. I will think some more about this. Fran From ftyers at prompsit.com Sun Feb 28 21:40:19 2010 From: ftyers at prompsit.com (Francis Tyers) Date: Sun, 28 Feb 2010 20:40:19 +0100 Subject: [libvoikko] [Apertium-stuff] Lttoolbox (Apertium) morphology backend In-Reply-To: <20cf28cd1002281218h4fc0f55w7fe6138d1287a03c@mail.gmail.com> References: <201002242055.06904.hatapitk@iki.fi> <201002281100.45122.hatapitk@iki.fi> <1267349837.2929.25212.camel@eki.dlsi.ua.es> <201002282140.18269.hatapitk@iki.fi> <1267383867.2929.26408.camel@eki.dlsi.ua.es> <20cf28cd1002281218h4fc0f55w7fe6138d1287a03c@mail.gmail.com> Message-ID: <1267386019.2929.26483.camel@eki.dlsi.ua.es> El dg 28 de 02 de 2010 a les 21:18 +0100, en/na Jacob Nordfalk va escriure: > > > 2010/2/28 Francis Tyers > El dg 28 de 02 de 2010 a les 21:40 +0200, en/na Harri Pitk?nen > va > escriure: > > On Sunday 28 February 2010, Francis Tyers wrote: > > > > I don't know Icelandic at all and therefore can't tell > whether some of > > > > the words are accepted or rejected incorrectly. > > > > > > Nice, it looks good. Some of the capitalised words should > be recognised > > > corrected, at least 'Bretlandi' and 'Nor?menn' . > > > > > I tried to fix the checking of capitalized words but started > to run into > > problems. It seems that the library API works in somewhat > surprising (at least > > to me) ways when you enter a word that starts with a capital > letter and ends > > with garbage. > > > > The implementation is here > > > http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182&view=markup > > > > and test cases here > > > http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183&view=markup > > > > I was able to get all test cases expect the one with TODO in > method name > > implemented. How would you suggest fixing the code so that > all tests would > > pass? Of course a patch would be most welcome :) > > Hmm, strangely enough, when I try an unknown word I get > similar strange > output: > > $ ./test mor.bin > ^Reykjanghfghesi$ --> > ^Reykja/Reykja/Reykur$ > > It seems that in the 'biltrans' mode, the 'standard' sections > are > treated as inconditional. e.g. it just returns the longest > match in all > cases. > > I will think some more about this. > > > Biltrans must actually work like this. > I dont understand why you would use biltrans in an analyser. Because biltrans takes a string, not a FILE* > > In biltrans partial match are allowed. The symbols (and letters) after > the match is called the queue. > For example, the input symbol house > Matches in the bidix house -> domo and the queue is > The result is domo Hmm, ok, so probably we need a new method :( Fran From ftyers at prompsit.com Sun Feb 28 22:07:12 2010 From: ftyers at prompsit.com (Francis Tyers) Date: Sun, 28 Feb 2010 21:07:12 +0100 Subject: [libvoikko] [Apertium-stuff] Lttoolbox (Apertium) morphology backend In-Reply-To: <20cf28cd1002281247t162da7dfw73e37cb44329966d@mail.gmail.com> References: <201002242055.06904.hatapitk@iki.fi> <201002281100.45122.hatapitk@iki.fi> <1267349837.2929.25212.camel@eki.dlsi.ua.es> <201002282140.18269.hatapitk@iki.fi> <1267383867.2929.26408.camel@eki.dlsi.ua.es> <20cf28cd1002281218h4fc0f55w7fe6138d1287a03c@mail.gmail.com> <1267386019.2929.26483.camel@eki.dlsi.ua.es> <20cf28cd1002281247t162da7dfw73e37cb44329966d@mail.gmail.com> Message-ID: <1267387632.2929.26539.camel@eki.dlsi.ua.es> El dg 28 de 02 de 2010 a les 21:47 +0100, en/na Jacob Nordfalk va escriure: > > > 2010/2/28 Francis Tyers > El dg 28 de 02 de 2010 a les 21:18 +0100, en/na Jacob Nordfalk > va > escriure: > > > > > > > 2010/2/28 Francis Tyers > > El dg 28 de 02 de 2010 a les 21:40 +0200, en/na > Harri Pitk?nen > > va > > escriure: > > > On Sunday 28 February 2010, Francis Tyers wrote: > > > > > I don't know Icelandic at all and therefore > can't tell > > whether some of > > > > > the words are accepted or rejected > incorrectly. > > > > > > > > Nice, it looks good. Some of the capitalised > words should > > be recognised > > > > corrected, at least 'Bretlandi' and 'Nor?menn' . > > > > > > > > I tried to fix the checking of capitalized words > but started > > to run into > > > problems. It seems that the library API works in > somewhat > > surprising (at least > > > to me) ways when you enter a word that starts with > a capital > > letter and ends > > > with garbage. > > > > > > The implementation is here > > > > > > http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182&view=markup > > > > > > and test cases here > > > > > > http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183&view=markup > > > > > > I was able to get all test cases expect the one > with TODO in > > method name > > > implemented. How would you suggest fixing the code > so that > > all tests would > > > pass? Of course a patch would be most welcome :) > > > > Hmm, strangely enough, when I try an unknown word I > get > > similar strange > > output: > > > > $ ./test mor.bin > > ^Reykjanghfghesi$ --> > > > ^Reykja/Reykja/Reykur$ > > > > It seems that in the 'biltrans' mode, the 'standard' > sections > > are > > treated as inconditional. e.g. it just returns the > longest > > match in all > > cases. > > > > I will think some more about this. > > > > > > Biltrans must actually work like this. > > I dont understand why you would use biltrans in an analyser. > > > Because biltrans takes a string, not a FILE* > > > > > In biltrans partial match are allowed. The symbols (and > letters) after > > the match is called the queue. > > For example, the input symbol house > > Matches in the bidix house -> domo and the queue > is > > The result is domo > > > > The above is behaviour of biltransWithQueue() > Nah, that doesn't do it :/ The new method should probably just be a copy of the old one, only that checks to see if all the input has been consumed. Fran