字符串编码（UTF-8）JAVA

Question

Could anyone please help me out here. 任何人都可以在这里帮助我。 I want to know the difference in below two string formatting. 我想知道以下两种字符串格式的区别。 I am trying to encode the string to UTF-8. 我正在尝试将字符串编码为UTF-8。 which one is the correct method. 哪一种是正确的方法。

String string2 = new String(string1.getBytes("UTF-8"), "UTF-8"));

OR 要么

String string3 = new String(string1.getBytes(),"UTF-8"));

ALSO if I use above two code together ie 另外，如果我同时使用上述两个代码，即

line 1 :string1 = new String(string1.getBytes("UTF-8"), "UTF-8")); 
line 2 :string1 = new String(string1.getBytes(),"UTF-8"));

Will the value of string1 will be the same in both the lines? 这两行中string1的值是否相同？

PS: Purpose of doing all this is to send Japanese text in web service call. PS：完成所有这些操作的目的是在Web服务调用中发送日语文本。 So I want to send it with UTF-8 encoding. 所以我想用UTF-8编码发送它。

Answer 1

According to the javadoc of String#getBytes(String charsetName) : 根据String#getBytes(String charsetName)的javadoc：

Encodes this String into a sequence of bytes using the named charset, storing the result into a new byte array. 使用命名的字符集将此String编码为字节序列，并将结果存储到新的字节数组中。

And the documentation of String(byte[] bytes, Charset charset) 以及String(byte[] bytes, Charset charset)的文档

Constructs a new String by decoding the specified array of bytes using the specified charset. 通过使用指定的字符集解码指定的字节数组来构造新的String。

Thus getBytes() is opposite operation of String(byte []) . 因此， getBytes()与String(byte [])操作相反。 The getBytes() encodes the string to bytes, and String(byte []) will decode the byte array and convert it to string. getBytes()将字符串编码为字节，而String(byte [])将解码字节数组并将其转换为字符串。 You will have to use same charset for both methods to preserve the actual string value. 两种方法都必须使用相同的字符集，以保留实际的字符串值。 Ie your second example is wrong: 即您的第二个示例是错误的：

// This is wrong because you are calling getBytes() with default charset
// But converting those bytes to string using UTF-8 encoding. This will 
// mostly work because default encoding is usually UTF-8, but it can fail
// so it is wrong.
new String(string1.getBytes(),"UTF-8"));

Answer 2

String and char (two-bytes UTF-16) in java is for (Unicode) text. Java中的String和char （两个字节的UTF-16）用于（Unicode）文本。

When converting from and to byte[] s one needs the Charset (encoding) of those bytes. 当从byte[]转换为byte[]需要这些字节的Charset （编码）。

Both String.getBytes() and new String(byte[]) are short cuts that use the default operating system encoding. String.getBytes()和new String(byte[])都是使用默认操作系统编码的快捷方式。 That almost always is wrong for crossplatform usages. 对于跨平台使用来说，这几乎总是错误的。

So use 所以用

byte[] b = s.getBytes("UTF-8");
s = new String(b, "UTF-8");

Or better, not throwing an UnsupportedCharsetException: 或者更好的是，不抛出UnsupportedCharsetException：

byte[] b = s.getBytes(StandardCharsets.UTF_8);
s = new String(b, StandardCharsets.UTF_8);

(Android does not know StandardCharsets however.) （但是，Android不知道StandardCharsets。）

The same holds for InputStreamReader, OutputStreamWriter that bridge binary data (InputStream/OutputStream) and text (Reader, Writer). 桥接二进制数据（InputStream / OutputStream）和文本（Reader，Writer）的InputStreamReader, OutputStreamWriter也是一样。

Answer 3

Please don't confuse yourself. 请不要混淆自己。 "String" is usually used to refer to values in a datatype that stores text. “字符串”通常用于引用存储文本的数据类型中的值。 In this case, java.lang.String . 在这种情况下，为java.lang.String 。

Serialized text is a sequence of bytes created by applying a character encoding to a string. 序列化文本是通过对字符串应用字符编码创建的字节序列。 In this case, byte[] . 在这种情况下， byte[] 。

There are no UTF-8-encoded strings in Java. Java中没有UTF-8编码的字符串。

If your web service client library takes a string, pass it the string. 如果您的Web服务客户端库采用字符串，则将其传递给该字符串。 If it lets you specify an encoding to use for serialization, pass it StandardCharsets.UTF_8 or equivalent. 如果允许您指定用于序列化的编码，则将其传递给StandardCharsets.UTF_8或等效的编码。

If it doesn't take a string, then pass it string1.GetBytes(StandardCharsets.UTF_8) and use whatever other mechanism it provides to let you tell the recipient that the bytes are UTF-8-encoded text. 如果不接收字符串， 则将其传递给string1.GetBytes(StandardCharsets.UTF_8)并使用其提供的任何其他机制来告诉接收者字节是UTF-8编码的文本。 Or, get a different client library. 或者，获得其他客户端库。

字符串编码（UTF-8）JAVA

问题描述

3 个解决方案

解决方案1
2 已采纳 2018-03-28 14:38:47

解决方案2
1 2018-03-28 14:41:18

解决方案3
0 2018-03-28 17:06:05

字符串编码（UTF-8）JAVA

问题描述

3 个解决方案

解决方案1 2 已采纳 2018-03-28 14:38:47

解决方案2 1 2018-03-28 14:41:18

解决方案3 0 2018-03-28 17:06:05

解决方案1
2 已采纳 2018-03-28 14:38:47

解决方案2
1 2018-03-28 14:41:18

解决方案3
0 2018-03-28 17:06:05