[libvoikko] Problems with dictionary generating scripts
Harri Pitkänen
hatapitk at iki.fi
Wed Oct 18 21:41:19 EEST 2017
Hi!
Pekka Kilpeläinen kirjoitti 2017-10-18 19:41:
> 1: The script appears to require that the </word> end tags are given
> in the XML input of the dictionary at the very start of the line. If
> not,
> then the script fails to terminate. Of course there's a simple
> work-around, but it is against the spirit of XML to need to be
> concerned with
> such low-level formatting details.
Right. See https://github.com/voikko/corevoikko/issues/33 for more info
on this.
> 2: What is wrong with the following input?
> - - - CLIP - - -
> $ cat vocabulary/joukahainen.xml
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE wordlist SYSTEM "wordlist.dtd">
> <wordlist xml:lang="fi">
> <word id="w15">
> <forms>
> <form>bänet</form>
> </forms>
> <classes>
> <wclass>noun</wclass>
> </classes>
> <inflection>
> <infclass>nalle</infclass>
> </inflection>
> </word>
> </wordlist>
> $ make vvfst-install
> python3 vvfst/generate_lex.py --destdir=vvfst
> Traceback (most recent call last):
> File "vvfst/generate_lex.py", line 435, in <module>
> handle_word, True)
> File "common/voikkoutils.py", line 221, in process_wordlist
> word_handler(word.documentElement)
> File "vvfst/generate_lex.py", line 369, in handle_word
> sys.stderr.write(errorstr.encode("UTF-8"))
> TypeError: write() argument must be str, not bytes
> Makefile:224: recipe for target 'vvfst/joukahainen.lexc.stamp' failed
> make: *** [vvfst/joukahainen.lexc.stamp] Error 1
You are not using the latest version of the code it seems:
https://github.com/voikko/corevoikko/commit/a2a57b04a06020abdd86db89d389a813f8e28a1f
You should get an error about invalid inflection class because you are
not supposed to use plural <form>bänet</form>. If singular forms are not
allowed for some word you mark them with
<inflection><flag>ei_yks</flag></inflection>.
Harri
More information about the Libvoikko
mailing list