[voikko] Malaga-fi - Finnish plugin for Nutch

Hannu Väisänen hvaisane at joyx.joensuu.fi
Mon Jun 29 08:21:04 EEST 2009


Malaga-fi is a Nutch plugin for indexing documents written in Finnish.


Malaga-fi analyses words morphologically, converts them to a base form
(that you find in dictionaries) and indexes the base forms, so that
you find all inflections of a word by just searching for the base
form.

To use an English example, if you search for the word "give" you find
all documents that have "give", "gives", "gave", "given", or "giving".

This is very important in Finnish since Finnish words have literally
tens of thousands of inflected forms.


What you need:

1. Malaga programming language.
   http://home.arcor.de/bjoern-beutel/malaga/


2. Suomimalaga - Description of Finnish morphology written in Malaga.
   http://sourceforge.net/project/showfiles.php?group_id=156731

   Newest version:
   svn co https://voikko.svn.sourceforge.net/svnroot/voikko/trunk/suomimalaga


3. Malaga-Java - Java interface to Malaga.
   http://joyds1.joensuu.fi/programs/index.html

   Malaga-Java has two versions; both are in the same file.
   You need the thread-safe version.


4. Malaga-fi - Nutch plugin for documents written in Finnish.
   http://joyds1.joensuu.fi/programs/index.html


5. Nutch: http://lucene.apache.org/nutch/



Malaga-fi is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.



More information about the voikko mailing list