[libvoikko] Problems with dictionary generating scripts

Pekka Kilpeläinen pekka.kilpelainen at profium.com
Wed Oct 18 19:41:46 EEST 2017


I'm facing difficulties with the vvfst dictionary generating scripts, esp. generate_lex.py:

1: The script appears to require that the </word> end tags are given in the XML input of the dictionary at the very start of the line. If not,
then the script fails to terminate. Of course there's a simple work-around, but it is against the spirit of XML to need to be concerned with
such low-level formatting details.

2: What is wrong with the following input? 
- - - CLIP - - -
$ cat vocabulary/joukahainen.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE wordlist SYSTEM "wordlist.dtd">
<wordlist xml:lang="fi">
<word id="w15">
      <forms>
         <form>bänet</form>
      </forms>
      <classes>
         <wclass>noun</wclass>
      </classes>
      <inflection>
         <infclass>nalle</infclass>
      </inflection>
</word>
</wordlist>
$ make vvfst-install
python3 vvfst/generate_lex.py  --destdir=vvfst
Traceback (most recent call last):
  File "vvfst/generate_lex.py", line 435, in <module>
    handle_word, True)
  File "common/voikkoutils.py", line 221, in process_wordlist
    word_handler(word.documentElement)
  File "vvfst/generate_lex.py", line 369, in handle_word
    sys.stderr.write(errorstr.encode("UTF-8"))
TypeError: write() argument must be str, not bytes
Makefile:224: recipe for target 'vvfst/joukahainen.lexc.stamp' failed
make: *** [vvfst/joukahainen.lexc.stamp] Error 1
- - CLIP - - -
('bänet' is a word of inflection type 8 (nalle, nallen, nallea, ...) according to the Finnish word-list kotus-sanalista_v1.xml by the Institute for the Languages of Finland.)


Regads, Pekka K.
--
Pekka Kilpeläinen
Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland
Tel. +358 (0)9 8559 8000 Fax. +358 (0)9 8559 8002
Mob. +358 (0)50 5814 194 Internet: www.profium.com


More information about the Libvoikko mailing list