简体   繁体   中英

Java: Runtime.exec() and Unicode symbols on Windows: how to make it work with non-English letters?

Intro

I am using Runtime.exec() to execute some external command and I am using parameters that contain non-English characters . I simply want to run something like this: python test.py шалом

It works correctly in cmd directly, but is incorrectly handled via Runtime.exec.getRuntime()("python test.py шалом")

On Windows my external program fails due to unknown symbols passed to it.

I remember similar issue from early 2010s (!) - JDK-4947220 , but I thought it is already fixed since Java core 1.6.

Environments:

OS: Name Microsoft Windows 10 Pro (Version 10.0.18362 Build 18362)

Java: jdk1.8.0_221

Code

To understand the question - the best way is to use code snippet listed below:

import java.io.BufferedReader;
import java.io.InputStreamReader;

public class MainClass {
    private static void foo(String filename) {
        try {
            BufferedReader input = new BufferedReader(
                    new InputStreamReader(
                            Runtime.getRuntime().exec(filename).getInputStream()));
            String line;
            while ((line = input.readLine()) != null) {
                System.out.println(line);
            }
            input.close();
        } catch (Exception e) { /* ... */ }
    }

    public static void main(String[] args) {
        foo("你好.bat 你好"); // ??
        foo("привет.bat привет"); // ??????
        foo("hi.bat hi"); // hi
    }
}

Where .bat file contains only simple @echo %1 The output will be:

??
??????
hi

PS System.out.println("привет") - works fine and prints everything correctly

Questions are the following:

1) Is this issue related to Utf-8 utf-16 formats?

2) How to fix this issue? I do not like this answer as it looks like a very dangerous and ugly workaround.

3) Does anyone know why file names of batch file is not broken and this file can be found, but the argument gets broken? May be it is problem of @echo ?

  1. Yes, issue is related with UTF. Theoretically a setting 65001 codepage for cmd that executes the bat files should solve the issue (along with setting UTF-8 charset as default from the Java side)

  2. Unfortunately there a bug in Windows mentioning here Java, Unicode, UTF-8, and Windows Command Prompt

  3. So there's no simple and complete solution. What it's possible to do is to set the same default language-specific encoding, like cp1251 Cyrillic, for both java and cmd . Not all languages are well reflected in the windows encodings, for example Chinese is one of them.

If there's some non-technical restriction on the windows system to change default encoding to the language-specific one for all cmd processes, the java code will be more complicated. At beginning new cmd process have to be created and to its stdin/stdout streams should be attached reader with UTF-16LE (for `cmd /U' process) and writer with CP1251 from different threads. First command sending to stdin from java should be 'chcp 1251' and second is the name of bat-file with its parameters.

Complete solution still may use UTF-16LE for reading of cmd output but to pass a text in, other universal encoding should be used, for example base64, which again leads to increasing complexity

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM