[英]Fail to read Japanese Characters via System.in
Code:代码:
Scanner sc = new Scanner(System.in);
System.out.println("Enter Name : ");
String name = sc.nextLine();
System.out.println(name);
String encoding = "UTF-8";
System.out.println(new String(name.getBytes(encoding), "euc-jp"));
System.out.println(new String(name.getBytes(encoding), "Shift_JIS"));
System.out.println(new String(name.getBytes(encoding), "ISO-2022-JP"));
System.out.println(new String(name.getBytes(encoding), "ISO8859-1"));
Input:输入:
Enter Name : たなかです
输入名称:たなかです
Output:输出:
F Q N @
F Q N@
鐃 鐃 鐃緒申鐃 鐃
铙 铙 铙绪申铙 铙
ソスF ソスQ ソス ソス ソスN ソス@
...
F Q N @
F Q N @
�F�Q���N�@
�F�Q���N�@
None of them are readable Japanese.它们都不是可读的日语。 I've also tried
InputStreamReader
and DataInputStream
with Byte[]
.我也试过
InputStreamReader
和DataInputStream
与Byte[]
。
name.getBytes(encoding)
in your code will get the raw-byte representation of the String name
with UTF-8 encoding.代码中的
name.getBytes(encoding)
将使用 UTF-8 编码获取String name
的原始字节表示。 So when you type "たなかです" in console, you will get the array of byte {0xE3, 0x81, 0x9F, 0xE3, 0x81, 0xAA, 0xE3, 0x81, 0x8B, 0xE3, 0x81, 0xA7, 0xE3, 0x81, 0x99}
.所以当你在控制台输入“たなかです”时,你会得到字节数组
{0xE3, 0x81, 0x9F, 0xE3, 0x81, 0xAA, 0xE3, 0x81, 0x8B, 0xE3, 0x81, 0xA7, 0xE3, 0x81, 0x99}
It is UTF-8 based representation, so the only encoding you can specify in the 2nd argument of the constructor String(byte[] bytes, String charsetName)
is UTF-8
.它是基于 UTF-8 的表示,因此您可以在构造函数
String(byte[] bytes, String charsetName)
的第二个参数中指定的唯一编码是UTF-8
。
System.out.println(new String(name.getBytes(encoding), "UTF-8"));
It converts the byte array {0xE3, 0x81, 0x9F, ... }
to a String
object, and prints to the console properly.它将字节数组
{0xE3, 0x81, 0x9F, ... }
转换为String
对象,并正确打印到控制台。
String
object uses UTF-16 for the internal text representation (see https://docs.oracle.com/javase/8/docs/technotes/guides/intl/overview.html for details). String
对象使用 UTF-16 作为内部文本表示(有关详细信息,请参阅https://docs.oracle.com/javase/8/docs/technotes/guides/intl/overview.html )。
So you have to use name.getBytes("UTF-16")
when you want to get the byte array that same as the internal text representation.因此,当您想要获取与内部文本表示相同的字节数组时,您必须使用
name.getBytes("UTF-16")
。 You can reverse it to a String
object with System.out.println(new String(name.getBytes("UTF-16"), "UTF-16"));
您可以使用
System.out.println(new String(name.getBytes("UTF-16"), "UTF-16"));
将其反转为String
对象System.out.println(new String(name.getBytes("UTF-16"), "UTF-16"));
. .
there is slight problem in your following code snippet, you are using same encoding for different charsets,您的以下代码片段中存在小问题,您对不同的字符集使用相同的编码,
String encoding = System.getProperty("file.encoding");
System.out.println(new String(name.getBytes(encoding), "UTF-8"));
assuming you want to print the japanese characters using different charset's ,use this假设您想使用不同的字符集打印日语字符,请使用此
System.out.println(new String(name.getBytes("euc-jp"), "euc-jp"));
System.out.println(new String(name.getBytes("Shift_JIS"), "Shift_JIS"));
System.out.println(new String(name.getBytes("ISO-2022-JP"), "ISO-2022-JP"));
System.out.println(new String(name.getBytes("ISO8859-1"), "ISO8859-1"));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.