[libvoikko] Possible encoding bug in latest voikkospell

Harri Pitkänen hatapitk at iki.fi
Thu Sep 25 22:59:02 EEST 2014


On Thursday 25 September 2014 20:53:07 Sjur Moshagen wrote:
> 25. sep. 2014 kl. 20:00 skrev Harri Pitkänen <hatapitk at iki.fi>:
> > What does "locale" command print
> 
> $ locale
> LANG="no_NO.UTF-8"
> LC_COLLATE="no_NO.UTF-8"
> LC_CTYPE="no_NO.UTF-8"
> LC_MESSAGES="no_NO.UTF-8"
> LC_MONETARY="no_NO.UTF-8"
> LC_NUMERIC="no_NO.UTF-8"
> LC_TIME="no_NO.UTF-8"
> LC_ALL="no_NO.UTF-8"

That seems correct to me and should work. Here on Linux I have

$ locale
LANG=fi_FI.UTF-8
LANGUAGE=
LC_CTYPE="fi_FI.UTF-8"
LC_NUMERIC="fi_FI.UTF-8"
LC_TIME="fi_FI.UTF-8"
LC_COLLATE="fi_FI.UTF-8"
LC_MONETARY="fi_FI.UTF-8"
LC_MESSAGES="fi_FI.UTF-8"
LC_PAPER="fi_FI.UTF-8"
LC_NAME="fi_FI.UTF-8"
LC_ADDRESS="fi_FI.UTF-8"
LC_TELEPHONE="fi_FI.UTF-8"
LC_MEASUREMENT="fi_FI.UTF-8"
LC_IDENTIFICATION="fi_FI.UTF-8"
LC_ALL=

$ voikkospell --version                                                                                                                                                                  
voikkospell version 3.7.1                                                                                                                                                                              
libvoikko version 3.7.1

$ echo giellla | voikkospell -s -d se
W: giellla                                                                                                                                                                                             
S: giella                                                                                                                                                                                              
S: giellal                                                                                                                                                                                             
S: giellala                                                                                                                                                                                            
S: giellula                                                                                                                                                                                            
S: giellá

No problems here.

> > and what are the settings of your terminal
> > window?
> 
> I use Apple’s default settings, which means UTF-8 is the encoding of the
> terminal. Is there any other setting I should pay attention to?

I don't know, I was just wondering if there are any settings that might be
relevant.

Anyway, maybe there is a real difference between C and C++ locales on OS X.
Please test if the patch below helps.

Harri


diff --git a/libvoikko/src/tools/voikkospell.cpp b/libvoikko/src/tools/voikkospell.cpp
index 18bd336..93ab4ec 100644
--- a/libvoikko/src/tools/voikkospell.cpp
+++ b/libvoikko/src/tools/voikkospell.cpp
@@ -27,6 +27,7 @@
  *********************************************************************************/
 
 #include "../voikko.h"
+#include <locale>
 #include <cstdlib>
 #include <cstdio>
 #include <cstring>
@@ -480,7 +481,7 @@ int main(int argc, char ** argv) {
        wchar_t * line = new wchar_t[MAX_WORD_LENGTH + 1];
        
        // Use stdout in wide character mode and stderr in narrow character mode.
-       setlocale(LC_ALL, "");
+       locale::global(locale(""));
        fwide(stdout, 1);
        fwide(stderr, -1);
        initThreads();



More information about the Libvoikko mailing list