[libvoikko] Problems with dictionary generating scripts

Harri Pitkänen hatapitk at iki.fi
Wed Oct 18 21:41:19 EEST 2017


Hi!

Pekka Kilpeläinen kirjoitti 2017-10-18 19:41:
> 1: The script appears to require that the </word> end tags are given
> in the XML input of the dictionary at the very start of the line. If
> not,
> then the script fails to terminate. Of course there's a simple
> work-around, but it is against the spirit of XML to need to be
> concerned with
> such low-level formatting details.

Right. See https://github.com/voikko/corevoikko/issues/33 for more info
on this.

> 2: What is wrong with the following input?
> - - - CLIP - - -
> $ cat vocabulary/joukahainen.xml
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE wordlist SYSTEM "wordlist.dtd">
> <wordlist xml:lang="fi">
> <word id="w15">
>       <forms>
>          <form>bänet</form>
>       </forms>
>       <classes>
>          <wclass>noun</wclass>
>       </classes>
>       <inflection>
>          <infclass>nalle</infclass>
>       </inflection>
> </word>
> </wordlist>
> $ make vvfst-install
> python3 vvfst/generate_lex.py  --destdir=vvfst
> Traceback (most recent call last):
>   File "vvfst/generate_lex.py", line 435, in <module>
>     handle_word, True)
>   File "common/voikkoutils.py", line 221, in process_wordlist
>     word_handler(word.documentElement)
>   File "vvfst/generate_lex.py", line 369, in handle_word
>     sys.stderr.write(errorstr.encode("UTF-8"))
> TypeError: write() argument must be str, not bytes
> Makefile:224: recipe for target 'vvfst/joukahainen.lexc.stamp' failed
> make: *** [vvfst/joukahainen.lexc.stamp] Error 1

You are not using the latest version of the code it seems:

   
https://github.com/voikko/corevoikko/commit/a2a57b04a06020abdd86db89d389a813f8e28a1f

You should get an error about invalid inflection class because you are
not supposed to use plural <form>bänet</form>. If singular forms are not
allowed for some word you mark them with
<inflection><flag>ei_yks</flag></inflection>.

Harri


More information about the Libvoikko mailing list