简体   繁体   English

Windows控制台中的unicode字符比预期的多

[英]More unicode characters in windows console than expected

I want to print russian and german characters in windows console. 我想在Windows控制台中打印俄语和德语字符。 So I wrote a small test program to understand how well it works: 所以我写了一个小测试程序来了解它的工作原理:

PrintStream ps = new PrintStream(System.out, false, "UTF-8");
ps.println("öäüß гджщ");

Then I started cmd.exe, changed its font to Lucida Console, which supports Unicode, changed code page to Unicode with "chcp 65001" and executed my program. 然后我启动了cmd.exe,将其字体改为Lucida Console,它支持Unicode,用“chcp 65001”将代码页改为Unicode,并执行了我的程序。

The german and russian characters were printed but there was a little more text than I expected (underlined with red): 打印出德语和俄语字符,但文字比我预期的要多一些(用红色加下划线): 在此输入图像描述

But the text is printed correctly in the Eclipse console. 但是文本在Eclipse控制台中正确打印。 Is there a way to print it correctly in windows console? 有没有办法在Windows控制台中正确打印? I use Windows 7. 我使用Windows 7。

I've just solved the problem with JNI, but it is still interesting whether it is doable with pure java. 我刚刚用JNI解决了这个问题,但它是否可以用纯java来解决它仍然很有趣。

Every time you open or write a file, a certain encoding will be applied. 每次打开或写入文件时,都会应用某种编码。 But sometimes we forget that also our IDE (Eclipse in your case) has an encoding. 但有时我们会忘记我们的IDE(在你的情况下是Eclipse)也有编码。

When you are typing a certain text between quotes, it is displayed and typed in a certain encoding, the encoding of your IDE. 当您在引号之间键入特定文本时,它将以特定编码显示并键入,即IDE的编码。 Your assumption is that the encoding of your output stream (UTF-8) will also guarantee that the text is displayed with that specific encoding. 您的假设是输出流(UTF-8)的编码也将保证文本以特定编码显示。 However, I think also here again the encoding of your IDE is applied. 但是,我想在这里再次应用IDE的编码。

I would propose to double check your encoding of eclipse. 我建议仔细检查你的eclipse编码。 Perhaps this can solve your problem. 也许这可以解决你的问题。 Certainly worth a try, isn't it ? 当然值得一试,不是吗? :) :)

For a global encoding setting add the following code to the eclipse.ini file 对于全局编码设置,请将以下代码添加到eclipse.ini文件中

-Dfile.encoding=UTF-8 

EDIT: 编辑:

I would just like to add the following. 我想添加以下内容。 I performed the following steps as an experiment. 我作为实验执行了以下步骤。

  1. I opened Notepad++ and created a new file 我打开了Notepad ++并创建了一个新文件
  2. I modified the encoding setting to UTF-8 我将编码设置修改为UTF-8
  3. I copied your Russian text and pasted it in my new text file and saved it. 我复制了您的俄语文本并将其粘贴到我的新文本文件中并保存。
  4. Next I opened my windows console ("cmd") 接下来我打开了我的Windows控制台(“cmd”)
  5. I executed the "chcp 65001" command. 我执行了“chcp 65001”命令。
  6. Next I printed the content of the file in my console: "type file.txt" 接下来,我在控制台中打印了文件的内容:“type file.txt”
  7. Everything shows correctly. 一切都正确显示。

This does not confirm much, but it does confirm the fact that DOS can do the job if the content is foreseen in the right encoding. 这并不能证实这一点,但它确实证实了如果以正确的编码预见到内容,DOS可以完成这项工作。

EDIT2: EDIT2:

@ka3ak It's been over 2 years, but while reading a book about Java I/OI stumbled upon the following. @ ka3ak已经超过2年了,但在读一本关于Java I / OI的书时,偶然发现了以下内容。

System.console().printf(...) has better support for special characters than the System.out.println(...) method. System.out.println(...)方法相比, System.console().printf(...)对特殊字符的支持更好。

Since the PrintStream just wraps around the System.out stream, I guess you have the same limitations. 由于PrintStream只是包装System.out流,我猜你有相同的限制。 I am wondering if this could have solved the problem. 我想知道这是否可以解决问题。 If it still matters, please give it a try. 如果仍然重要,请试一试。 :) :)

Other posts on stackoverflow report similar things: console.writeline and System.out.println stackoverflow上的其他帖子报告类似的事情: console.writeline和System.out.println

After reading the answers and recommendations here I concluded that there must be a problem with JRE. 在阅读了答案和建议后,我得出结论,JRE一定存在问题。 Maybe this problem only exists in Windows 7 (unfortunately I don't have other Windows systems to experiment with). 也许这个问题只存在于Windows 7中(遗憾的是我没有其他Windows系统可供试验)。

The solution is to use JNI or if you want a simpler solution then use JNA. 解决方案是使用JNI,或者如果您想要更简单的解决方案,那么使用JNA。 I've found a useful JNA example, which solves my problem, here https://stackoverflow.com/a/8921509/971355 我找到了一个有用的JNA示例,它解决了我的问题,这里https://stackoverflow.com/a/8921509/971355

This is due to the a in ¼-hearted implementation of cp65001 in Windows. 这是因为Windows中的cp65001实现了1/4。 See the complete disclosure in @eryksun's answer . 请参阅@ eryksun的答案中的完整披露。

Short summary: only 7-bit (sic!) input/output works reliably in cp65001 (unless a CRTL makes workarounds) up to Windows 7. The problem with output is fixed in Windows 8. The problem with input is present in Windows 10. 简短摘要:在Windows 7中,只有7位(sic!)输入/输出在cp65001中可靠地工作(除非CRTL使解决方法)。在Windows 8中修复了输出问题。输入问题出现在Windows 10中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在matcher组中获得比预期更多的字符 - Getting more characters inside matcher group than expected 密钥超过3个字符的自制加密器/解密器无法正常工作 - self made encryptor/decryptor not working as expected with key more than 3 characters 从Java程序将Unicode字符打印到Windows 7控制台时,为什么还会显示其他字符? - Why does additional characters show up when I print unicode characters to Windows 7 console from a Java program? Unicode特殊字符出现在Java控制台中,但没有出现在Swing中 - Unicode special characters appearing in Java console, but not in Swing Netbeans 控制台不显示孟加拉语 unicode 字符 - Netbeans console does not display Bangla unicode characters 接收比预期更多的数据 - Reciving more data than expected 超过4个十六进制数字的Java Unicode转义 - Java unicode escape with more than 4 hexadecimal digits 在eclipse中有没有办法在不同的控制台窗口中运行多个Java应用程序? - In eclipse is there a way to run more than one Java app in different console windows? 为什么某些 Unicode 字符在控制台中显示为问号? - Why Some Unicode Characters appears to be question mark in the console? Unicode 字符在 IntelliJ IDEA 控制台中显示为问号 - unicode characters appear as question marks in IntelliJ IDEA console
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM