[libvoikko] Moving support for HFST spellers from experimental to stable

"Harri Pitkänen" hatapitk at iki.fi
Sun Jan 6 18:59:30 EET 2013


Hi!

There has recently been a lot of activity around implementing HFST
morphologies for various languages. If I recall correctly most of the
problems with spellers built from these have been with spelling correction
(either it has been slow or missed corrections that Hunspell would have
provided). But last time I seriously looked at this was over a year ago,
maybe this is no longer an issue?

In any case since many of the languages being worked on don't have
Hunspell dictionaries I believe it is finally time to promote HFST
spellers into officially supported status within libvoikko. I want to do
this for the next release. I can see three possible options:

1) HFST spellers are installed by placing suitable zhfst speller archives
under ~/.voikko/3/
2) HFST spellers are installed by placing a metadata file (voikko.pro),
HFST acceptor (spl.hfstol), and HFST error model (err.hfstol) under a
subdirectory of ~/.voikko/3/. Each language would have its own
subdirectory.
3) Combination of 1 and 2, that is we would use both zhfst speller and a
separate metadata file (this is essentially how it works right now).

I don't want to go with option 3 as it is the most complicated one and
requires duplicating the speller metadata. Option 1 would be the most
convenient for the users. Unfortunately I don't know if reading the XML
metadata from zhfst spellers is possible with current version of
hfst-ospell? I see a method called metadata_dump() but that's not good for
this.

If anyone has time to implement option 1 during the next few months, that
would be great. Otherwise I will most likely proceed with option 2 as
everything needed for it is mostly done and we would not need to depend on
XML and ZIP libraries.

Harri




More information about the Libvoikko mailing list