简体   繁体   English

在解析byte []时是否真的需要指定String编码?

[英]Is specifying String encoding when parsing byte[] really necessary?

Supposedly, it is "best practice" to specify the encoding when creating a String from a byte[] : 据推测,“最佳实践”是从byte[]创建String时指定编码:

byte[] b;
String a = new String(b, "UTF-8"); // 100% safe
String b = new String(b); // safe enough

If I know my installation has default encoding of utf8, is it really necessary to specify the encoding to still be "best practice"? 如果我知道我的安装具有utf8的默认编码,是否真的有必要将编码指定为“最佳实践”?

Different use cases have to be distinguished here: If you get the bytes from an external source via some protocol with a specified encoding then always use the first form (with explicit encoding). 在这里必须区分不同的用例:如果您通过某种协议从外部源以指定的编码获取字节,则始终使用第一种形式(使用显式编码)。

If the source of the bytes is the local machine, for example a local text file, the second form (without explicit encoding) is better. 如果字节的来源是本地计算机,例如本地文本文件,则第二种形式(无显式编码)更好。

Always keep in mind, that your program may be used on a different machine with a different platform encoding. 请始终记住,您的程序可能会在具有不同平台编码的另一台计算机上使用。 It should work there without any changes. 它应该在那里工作,无需任何更改。

If I know my installation has default encoding of utf8, is it really necessary to specify the encoding to still be "best practice"? 如果我知道我的安装具有utf8的默认编码,是否真的有必要将编码指定为“最佳实践”?

But do you know for sure that your installation will always have a default encoding of UTF-8? 但是您确定您的安装将始终使用默认的UTF-8编码吗? (Or at least, for as long as your code is used ...) (或者至少,只要使用您的代码...)

And do you know for sure that your code is never going to be used in a different installation that has a different default encoding? 并且您确定您的代码永远不会在具有不同默认编码的其他安装中使用吗?

If the answer to either of those is "No" (and unless you are prescient, it probably has to be "No") then I think that you should follow best practice ... and specify the encoding if that is what your application semantics requires: 如果对上述任何一个的回答都是“否”(除非您有先见之明,否则可能必须是“否”),那么我认为您应该遵循最佳实践……并指定编码(如果这是您的应用程序语义)要求:

  • If the requirement is to always encode (or decode) in UTF-8, then use "UTF-8" . 如果要求始终以UTF-8编码(或解码),则使用"UTF-8"

  • If the requirement is to always encode (or decode) in using the platform default, then do that. 如果要求始终使用平台默认值进行编码(或解码),请执行此操作。

  • If the requirement is to support multiple encodings (or the requirement might change) then make the encoding name a configuration (or command line) parameter, resolve to a Charset object and use that. 如果要求支持多种编码(或者要求可能会更改),则使编码名称成为配置(或命令行)参数,解析为Charset对象并使用它。

The point of this "best practice" recommendation is to avoid a foreseeable problem that will arise if your platform's characteristics change. 此“最佳实践”建议的重点是避免平台特性发生变化时出现的可预见的问题。 You don't think that is likely, but you probably can't be completely sure about it. 您认为这不太可能,但是您可能无法完全确定。 But at the end of the day, it is your decision. 但归根结底,这是您的决定。

(The fact that you are actually thinking about whether "best practice" is appropriate to your situation is a GOOD THING ... in my opinion.) (在我看来,您实际上在考虑“最佳实践”是否适合您的情况是一件好事……)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM