[libvoikko] How to contribute to create Finnish VFST dictionary

Harri Pitkänen hatapitk at iki.fi
Mon Sep 26 19:43:07 EEST 2016


Hi!

Taito Horiuchi kirjoitti 2016-09-23 12:28:
> I could not find the instruction how to contribute to create Finnish
> VFST dictionary.
> I could not even find where it is maintained.

The Finnish VFST dictionary is called voikko-fi and it is part of 
corevoikko repository at

   https://github.com/voikko/corevoikko

> Can somebody tell me:
> 
> 1) How to create your own dictionary.

Do you intend to create your own Finnish dictionary by extending 
voikko-fi? If this is what you want to do you can start here:

   https://github.com/voikko/corevoikko/tree/master/voikko-fi

Build the dictionary by running "make vvfst" and install it locally by 
running (for example) "make vvfst-install DESTDIR=~/.voikko". Once you 
get that working you can try making your modifications. Most of the 
vocabulary is under "vocabulary/joukahainen.xml" and the rest of the 
morphology is built from files under "vvfst" subdirectory.


Or do you want to create a dictionary for another language? Then you can 
pick any tool you like to create a finite state morphology for the 
language. The only requirement is that the transducer can be exported to 
AT&T format. You can then convert it into a VFST dictionary by using the 
"voikkovfstc" tool.

> 2) How to contribute  to create dictionary by adding new word.

Finnish dictionary (voikko-fi): The word list is maintained using web 
application "Joukahainen". The process of adding new words is described 
at

   http://joukahainen.puimula.org/docs/

For other languages: there are different tools and processes for 
different languages. You can ask here if you are interested in some 
specific language.


Hopefully I understood your questions correctly. If I missed something 
please let me know and I will happily provide more details.

Harri


More information about the Libvoikko mailing list