简体   繁体   English

Java:Windows 上的 Runtime.exec() 和 Unicode 符号:如何使其与非英文字母一起使用?

[英]Java: Runtime.exec() and Unicode symbols on Windows: how to make it work with non-English letters?

Intro介绍

I am using Runtime.exec() to execute some external command and I am using parameters that contain non-English characters .我正在使用Runtime.exec()来执行一些外部命令,并且我正在使用包含非英文字符的参数。 I simply want to run something like this: python test.py шалом我只想运行这样的东西: python test.py шалом

It works correctly in cmd directly, but is incorrectly handled via Runtime.exec.getRuntime()("python test.py шалом")它直接在 cmd 中正常工作,但通过Runtime.exec.getRuntime()("python test.py шалом")错误处理

On Windows my external program fails due to unknown symbols passed to it.在 Windows 上,我的外部程序由于传递给它的未知符号而失败。

I remember similar issue from early 2010s (!) - JDK-4947220 , but I thought it is already fixed since Java core 1.6.我记得 2010 年代早期的类似问题(!) - JDK-4947220 ,但我认为它自 Java 核心 1.6 以来已经修复。

Environments:环境:

OS: Name Microsoft Windows 10 Pro (Version 10.0.18362 Build 18362)操作系统:名称 Microsoft Windows 10 Pro(版本 10.0.18362 Build 18362)

Java: jdk1.8.0_221 Java: jdk1.8.0_221

Code代码

To understand the question - the best way is to use code snippet listed below:要理解这个问题 - 最好的方法是使用下面列出的代码片段:

import java.io.BufferedReader;
import java.io.InputStreamReader;

public class MainClass {
    private static void foo(String filename) {
        try {
            BufferedReader input = new BufferedReader(
                    new InputStreamReader(
                            Runtime.getRuntime().exec(filename).getInputStream()));
            String line;
            while ((line = input.readLine()) != null) {
                System.out.println(line);
            }
            input.close();
        } catch (Exception e) { /* ... */ }
    }

    public static void main(String[] args) {
        foo("你好.bat 你好"); // ??
        foo("привет.bat привет"); // ??????
        foo("hi.bat hi"); // hi
    }
}

Where .bat file contains only simple @echo %1 The output will be:其中 .bat 文件只包含简单的@echo %1输出将是:

??
??????
hi

PS System.out.println("привет") - works fine and prints everything correctly PS System.out.println("привет") - 工作正常并正确打印所有内容

Questions are the following:问题如下:

1) Is this issue related to Utf-8 utf-16 formats? 1) 这个问题是否与 utf-8 utf-16 格式有关?

2) How to fix this issue? 2)如何解决这个问题? I do not like this answer as it looks like a very dangerous and ugly workaround.我不喜欢这个答案,因为它看起来是一个非常危险和丑陋的解决方法。

3) Does anyone know why file names of batch file is not broken and this file can be found, but the argument gets broken? 3)有谁知道为什么批处理文件的文件名没有被破坏并且可以找到这个文件,但是参数被破坏了? May be it is problem of @echo ?可能是@echo问题?

  1. Yes, issue is related with UTF.是的,问题与 UTF 有关。 Theoretically a setting 65001 codepage for cmd that executes the bat files should solve the issue (along with setting UTF-8 charset as default from the Java side)理论上,执行 bat 文件的cmd设置 65001 代码页应该可以解决这个问题(以及从 Java 端将 UTF-8 字符集设置为默认值)

  2. Unfortunately there a bug in Windows mentioning here Java, Unicode, UTF-8, and Windows Command Prompt不幸的是,Windows 中存在一个错误,这里提到了Java、Unicode、UTF-8 和 Windows 命令提示符

  3. So there's no simple and complete solution.所以没有简单而完整的解决方案。 What it's possible to do is to set the same default language-specific encoding, like cp1251 Cyrillic, for both java and cmd .可以做的是为javacmd设置相同的默认语言特定编码,如 cp1251 Cyrillic 。 Not all languages are well reflected in the windows encodings, for example Chinese is one of them.并非所有语言都能在 windows 编码中得到很好的体现,例如中文就是其中之一。

If there's some non-technical restriction on the windows system to change default encoding to the language-specific one for all cmd processes, the java code will be more complicated.如果windows系统有一些非技术限制,将所有cmd进程的默认编码更改为特定于语言的编码,java代码会更复杂。 At beginning new cmd process have to be created and to its stdin/stdout streams should be attached reader with UTF-16LE (for `cmd /U' process) and writer with CP1251 from different threads.开始时,必须创建新的 cmd 进程,并且其 stdin/stdout 流应附加带有 UTF-16LE(用于`cmd /U' 进程)的读取器和来自不同线程的带有 CP1251 的写入器。 First command sending to stdin from java should be 'chcp 1251' and second is the name of bat-file with its parameters.从java发送到stdin的第一个命令应该是'chcp 1251',第二个是带有参数的bat文件的名称。

Complete solution still may use UTF-16LE for reading of cmd output but to pass a text in, other universal encoding should be used, for example base64, which again leads to increasing complexity完整的解决方案仍然可以使用 UTF-16LE 来读取 cmd 输出,但要传入文本,应使用其他通用编码,例如 base64,这再次导致复杂性增加

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM