简体   繁体   中英

Java, UTF-8, and Windows console

We try to use Java and UTF-8 on Windows. The application writes logs on the console, and we would like to use UTF-8 for the logs as our application has internationalized logs.

It is possible to configure the JVM so it generates UTF-8, using -Dfile.encoding=UTF-8 as arguments to the JVM. It works fine, but the output on a Windows console is garbled.

Then, we can set the code page of the console to 65001 ( chcp 65001 ), but in this case, the .bat files do not work. This means that when we try to launch our application through our script (named start.bat), absolutely nothing happens. The command simple returns:

C:\Application> chcp 65001
Activated code page: 65001
C:\Application> start.bat

C:\Application>

But without chcp 65001 , there is no problem, and the application can be launched.

Any hints about that?

Try chcp 65001 && start.bat

The chcp command changes the code page, and 65001 is the Win32 code page identifier for UTF-8 under Windows 7 and up. A code page, or character encoding, specifies how to convert a Unicode code point to a sequence of bytes or back again.

Java on windows does NOT support unicode ouput by default. I have written a workaround method by calling Native API with JNA library.The method will call WriteConsoleW for unicode output on the console.

import com.sun.jna.Native;
import com.sun.jna.Pointer;
import com.sun.jna.ptr.IntByReference;
import com.sun.jna.win32.StdCallLibrary;

/** For unicode output on windows platform
 * @author Sandy_Yin
 * 
 */
public class Console {
    private static Kernel32 INSTANCE = null;

    public interface Kernel32 extends StdCallLibrary {
        public Pointer GetStdHandle(int nStdHandle);

        public boolean WriteConsoleW(Pointer hConsoleOutput, char[] lpBuffer,
                int nNumberOfCharsToWrite,
                IntByReference lpNumberOfCharsWritten, Pointer lpReserved);
    }

    static {
        String os = System.getProperty("os.name").toLowerCase();
        if (os.startsWith("win")) {
            INSTANCE = (Kernel32) Native
                    .loadLibrary("kernel32", Kernel32.class);
        }
    }

    public static void println(String message) {
        boolean successful = false;
        if (INSTANCE != null) {
            Pointer handle = INSTANCE.GetStdHandle(-11);
            char[] buffer = message.toCharArray();
            IntByReference lpNumberOfCharsWritten = new IntByReference();
            successful = INSTANCE.WriteConsoleW(handle, buffer, buffer.length,
                    lpNumberOfCharsWritten, null);
            if(successful){
                System.out.println();
            }
        }
        if (!successful) {
            System.out.println(message);
        }
    }
}

We had some similar problems in Linux. Our code was in ISO-8859-1 (mostly cp-1252 compatible) but the console was UTF-8, making the code to not compile. Simply changing the console to ISO-8859-1 would make the build script, in UTF-8, to break. We found a couple of choices:
1- define some standard encoding and sticky to it. That was our choice. We choose to keep all in ISO-8859-1, modifying the build scripts.
2- Setting the encoding before starting any task, even inside the build scripts. Some code like the erickson said. In Linux was like :

lang=pt_BR.ISO-8859-1 /usr/local/xxxx

My eclipse is still like this. Both do work well.

您是否尝试过PowerShell而不是旧的 cmd.exe。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM