[libvoikko] Problems with dictionary generating scripts
Pekka Kilpeläinen
pekka.kilpelainen at profium.com
Wed Oct 18 19:41:46 EEST 2017
I'm facing difficulties with the vvfst dictionary generating scripts, esp. generate_lex.py:
1: The script appears to require that the </word> end tags are given in the XML input of the dictionary at the very start of the line. If not,
then the script fails to terminate. Of course there's a simple work-around, but it is against the spirit of XML to need to be concerned with
such low-level formatting details.
2: What is wrong with the following input?
- - - CLIP - - -
$ cat vocabulary/joukahainen.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE wordlist SYSTEM "wordlist.dtd">
<wordlist xml:lang="fi">
<word id="w15">
<forms>
<form>bänet</form>
</forms>
<classes>
<wclass>noun</wclass>
</classes>
<inflection>
<infclass>nalle</infclass>
</inflection>
</word>
</wordlist>
$ make vvfst-install
python3 vvfst/generate_lex.py --destdir=vvfst
Traceback (most recent call last):
File "vvfst/generate_lex.py", line 435, in <module>
handle_word, True)
File "common/voikkoutils.py", line 221, in process_wordlist
word_handler(word.documentElement)
File "vvfst/generate_lex.py", line 369, in handle_word
sys.stderr.write(errorstr.encode("UTF-8"))
TypeError: write() argument must be str, not bytes
Makefile:224: recipe for target 'vvfst/joukahainen.lexc.stamp' failed
make: *** [vvfst/joukahainen.lexc.stamp] Error 1
- - CLIP - - -
('bänet' is a word of inflection type 8 (nalle, nallen, nallea, ...) according to the Finnish word-list kotus-sanalista_v1.xml by the Institute for the Languages of Finland.)
Regads, Pekka K.
--
Pekka Kilpeläinen
Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland
Tel. +358 (0)9 8559 8000 Fax. +358 (0)9 8559 8002
Mob. +358 (0)50 5814 194 Internet: www.profium.com
More information about the Libvoikko
mailing list