[英]Java String encoding (UTF-8)
I have come across this line of legacy code, which I am trying to figure out: 我遇到过这一系列遗留代码,我想弄清楚:
String newString = new String(oldString.getBytes("UTF-8"), "UTF-8"));
As far as I can understand, it is encoding & decoding using the same charSet. 据我所知,它是使用相同的charSet进行编码和解码。
How is this different from the following? 这与以下有什么不同?
String newString = oldString;
Is there any scenario in which the two lines will have different outputs? 是否存在两条线路具有不同输出的情况?
ps: Just to clarify, yes I am aware of the excellent article on encoding by Joel Spolsky ! ps:只是为了澄清,是的,我知道Joel Spolsky关于编码的优秀文章 !
This could be complicated way of doing 这可能是复杂的做法
String newString = new String(oldString);
This shortens the String is the underlying char[] used is much longer. 这缩短了String使用的底层char []要长得多。
However more specifically it will be checking that every character can be UTF-8 encoded. 但更具体地说,它将检查每个字符是否可以是UTF-8编码。
There are some "characters" you can have in a String which cannot be encoded and these would be turned into ?
你可以在一个字符串中有一些“字符”,这些字符无法编码,这些将被转换成
?
Any character between \? and \? cannot be encoded and will be turned into '?' \\ uD800和\\ uDFFF之间的任何字符都无法编码,将变为“?”
String oldString = "\uD800";
String newString = new String(oldString.getBytes("UTF-8"), "UTF-8");
System.out.println(newString.equals(oldString));
prints 版画
false
How is this different from the following?
这与以下有什么不同?
This line of code here: 这行代码在这里:
String newString = new String(oldString.getBytes("UTF-8"), "UTF-8"));
constructs a new String object (ie a copy of oldString
), while this line of code: 构造一个新的String对象(即
oldString
的副本),而这行代码:
String newString = oldString;
declares a new variable of type java.lang.String
and initializes it to refer to the same String object as the variable oldString
. 声明一个
java.lang.String
类型的新变量并初始化它以引用与变量oldString
相同的String对象。
Is there any scenario in which the two lines will have different outputs?
是否存在两条线路具有不同输出的情况?
Absolutely: 绝对:
String newString = oldString;
boolean isSameInstance = newString == oldString; // isSameInstance == true
vs. 与
String newString = new String(oldString.getBytes("UTF-8"), "UTF-8"));
// isSameInstance == false (in most cases)
boolean isSameInstance = newString == oldString;
a_horse_with_no_name (see comment) is right of course. a_horse_with_no_name(见评论)当然是对的。 The equivalent of
相当于
String newString = new String(oldString.getBytes("UTF-8"), "UTF-8"));
is 是
String newString = new String(oldString);
minus the subtle difference wrt the encoding that Peter Lawrey explains in his answer. 减去Peter Lawrey在他的回答中解释的编码的细微差别。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.