简体   繁体   中英

In Java what is the best way to ensure that I'm getting UTF-8 strings?

从Servlet中的信标系统收集查询参数时,java中最好的方法是什么,以确保我将来自第三方站点的所有输入正确转换为可以存储在日志文件中的有效UTF-8字符串?

Java strings are internally always UTF-16. Where you really need to pay attention to encodings is when you convert bytes to Strings and vice versa, because that's what an encoding is: a set of rules to convert between bytes and characters/Strings. NOT a property of Strings. In your case, conversion should happen exactly twice: when you read from the third party sites, and when you write to your logfile.

When reading from the third party sites, you can not just use UTF-8, since those sites can use all kinds of different encodings. Thus you need to adhere to the encoding they declare in the HTTP header, HTML META tag, or XML header. Any decent HTTP client will do that for you, so you just need to let it do its job and not try to do anything fancy yourself.

When writing to your logfile, on the other hand, you should make sure you are using UTF-8 and not the platform default encoding (even if that is UTF-8, it could change). This should be done in your logging library's configuration, or if you write the files without such a library, when you create an OutputStreamWriter .

Step 1: make sure that the page containing the form is itself in UTF-8.

Step 2: check the headers of the incoming request to see if they give you a character set.

Step 3: don't depend on String(byte[]) or InputStreamReader(InputStream) . Always call functions that take an explicit character set specification.

String(byte[] bytes, Charset charset)构造函数允许您指定编码字符集。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM