简体   繁体   中英

More unicode characters in windows console than expected

I want to print russian and german characters in windows console. So I wrote a small test program to understand how well it works:

PrintStream ps = new PrintStream(System.out, false, "UTF-8");
ps.println("öäüß гджщ");

Then I started cmd.exe, changed its font to Lucida Console, which supports Unicode, changed code page to Unicode with "chcp 65001" and executed my program.

The german and russian characters were printed but there was a little more text than I expected (underlined with red): 在此输入图像描述

But the text is printed correctly in the Eclipse console. Is there a way to print it correctly in windows console? I use Windows 7.

I've just solved the problem with JNI, but it is still interesting whether it is doable with pure java.

Every time you open or write a file, a certain encoding will be applied. But sometimes we forget that also our IDE (Eclipse in your case) has an encoding.

When you are typing a certain text between quotes, it is displayed and typed in a certain encoding, the encoding of your IDE. Your assumption is that the encoding of your output stream (UTF-8) will also guarantee that the text is displayed with that specific encoding. However, I think also here again the encoding of your IDE is applied.

I would propose to double check your encoding of eclipse. Perhaps this can solve your problem. Certainly worth a try, isn't it ? :)

For a global encoding setting add the following code to the eclipse.ini file

-Dfile.encoding=UTF-8 

EDIT:

I would just like to add the following. I performed the following steps as an experiment.

  1. I opened Notepad++ and created a new file
  2. I modified the encoding setting to UTF-8
  3. I copied your Russian text and pasted it in my new text file and saved it.
  4. Next I opened my windows console ("cmd")
  5. I executed the "chcp 65001" command.
  6. Next I printed the content of the file in my console: "type file.txt"
  7. Everything shows correctly.

This does not confirm much, but it does confirm the fact that DOS can do the job if the content is foreseen in the right encoding.

EDIT2:

@ka3ak It's been over 2 years, but while reading a book about Java I/OI stumbled upon the following.

System.console().printf(...) has better support for special characters than the System.out.println(...) method.

Since the PrintStream just wraps around the System.out stream, I guess you have the same limitations. I am wondering if this could have solved the problem. If it still matters, please give it a try. :)

Other posts on stackoverflow report similar things: console.writeline and System.out.println

After reading the answers and recommendations here I concluded that there must be a problem with JRE. Maybe this problem only exists in Windows 7 (unfortunately I don't have other Windows systems to experiment with).

The solution is to use JNI or if you want a simpler solution then use JNA. I've found a useful JNA example, which solves my problem, here https://stackoverflow.com/a/8921509/971355

This is due to the a in ¼-hearted implementation of cp65001 in Windows. See the complete disclosure in @eryksun's answer .

Short summary: only 7-bit (sic!) input/output works reliably in cp65001 (unless a CRTL makes workarounds) up to Windows 7. The problem with output is fixed in Windows 8. The problem with input is present in Windows 10.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM