简体   繁体   English

如何在Java中解码Unicode编码?

[英]How to decode the Unicode encoding in java?

I have Search on my site we frame the query and send in the Request and Response comes back from the vendor as JSON. 我的网站上有“搜索”,我们对查询进行了框架并发送了请求,并且“响应”从供应商处以JSON的形式返回。 The vendor crawls our site and capture the data from our site and send response. 供应商对我们的网站进行爬网并从我们的网站捕获数据并发送响应。 In Our design we are converting the JSON into java object using GSON. 在我们的设计中,我们使用GSON将JSON转换为Java对象。 We use the UTF-8 as charset in the Meta. 我们在元数据中使用UTF-8作为字符集。

I have a situation the response has some times Unicode encoding for the special characters based on the request. 我遇到的情况是,响应有时会根据请求对特殊字符进行Unicode编码。 The browser is rendering this Unicode encoding for special characters in a strange way. 浏览器以一种奇怪的方式呈现特殊字符的Unicode编码。 How should i decode this Unicode encoding? 我应该如何解码这种Unicode编码?

For example, for the special character 'ndash' i see in the response it encoded as '\–' 例如,对于特殊字符“ ndash”,我在响应中看到它被编码为“ \\ u2013”

To clarify the differences between Unicode and a character encoding 澄清Unicode和字符编码之间的区别

Unicode 统一码

  • is an abstract concept aiming to identify all letters ( currently > 110 000). 是一个抽象概念,旨在识别所有字母( 当前 > 110 000)。

Character encoding 字符编码

  • defines how a character can be represending by a sequence of bytes 定义如何通过字节序列来重新呈现字符
  • one such encoding is utf-8 which uses 1-4 bytes to represent a Unicode character 一种这样的编码是utf-8 ,它使用1-4个字节表示Unicode字符

A java String is always UTF-16 . Java字符串 总是 UTF-16 Hence when you construct a String you can use the following String constructor 因此,当您构造一个String时,可以使用以下String构造函数

new String(byte[], encoding)

The second argument should be the encoding the characters are in when the client are sending them. 第二个参数应该是客户端发送字符时字符使用的编码。 If you don't explicilty define an encoding, you will get the default system encoding, which you can examine using Charset.defaultCharset(); 如果您不明确定义编码,则将获得默认的系统编码,可以使用Charset.defaultCharset();进行检查Charset.defaultCharset(); .

You can manually set the default encoding as an argument when starting the JVM 您可以在启动JVM时手动将默认编码设置为参数

-Dfile.encoding="utf-8"

Although rarely needed, you can also employ CharsetDecoder / CharsetEncoder . 尽管很少需要,但您也可以使用CharsetDecoder / CharsetEncoder

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM