简体   繁体   中英

Java unicode strange behaviour

I'm using HttpURLConnection to make requests to the twitter's API. The API returns a json with all the data encoded with UTF-8 (you can see that in the headers of the response). And I show the data in a .jsp (html).

I read the response (json) with this piece of code:

BufferedReader in = new BufferedReader(new InputStreamReader(http.getInputStream(),"UTF-8"));
String inputLine;
StringBuffer res = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
    // Append line to 'res', so I can have a string with all the json
    res.append(inputLine);
    // Print the line for debugging
    System.out.println(inputLine);
}
in.close();

Now, here comes the problem. Some values inside the json (for now just a String) are emojis, so they should be represented as unicode characters. And that's exactly what happens, at least with some of them. Here's an example with a user's name:

Original twitter text:

在此处输入图片说明

What I get from the API's response:

"name":"\uD83C\uDF52UserName"

How it is finally displayed in the .jsp:

在此处输入图片说明

It works fine with this emoji. The name is shown as it is shown in twitter. But look at the text of the following tweet. It's as if the unicode is duplicated but the second one is not displayed, or somethig strange. Note that in eclipse's console, you see ?? , but when it loads the .jsp, the emoji shows itself as it should. That's not the problem, it's just the encoding of the console, I guess (although it is indicative that something is wrong with that unicode, because in the first example it has shown then unicode, and not the ?? ).

Original twitter tweet:

在此处输入图片说明

What I get from the API's response:

"text":"?? Segons l'US Department of Justice, els infants que es crien sense pare són:\\n\\n?? 63% de suïcidis.\\n?? 90% d'indigents.\\n?? 85% de desordres en el comportament.\\n?? 71% de l'abandonament escolar.\\n?? 70% de les detencions juvenils.\\n?? 75% d'abús de drogues.\\n?? 75% dels violadors."}

How it is finally displayed in the .jsp:

在此处输入图片说明

The emojis are displayed correctly, but there's always a ? after them, and I don't know why.

Also, I should mention that, in the .jsp, in order to show convert the unicode to html-compatible-code, I use this library . You can see the difference between using the method to parse the unicode to hex and not using it here:

在此处输入图片说明

在此处输入图片说明

Any idea what's happening here?

The emojis mentioned are: 🍒(U+1F352) ♦️(U+2666) ❗️(U+2757)

Figured it out. That ? is a the character 65039 in decimal, so what I've done is replace that character by a space. Now the emojis are shown as I wanted, without that sign after them.

String strFinal = res2.toString().replace((char)65039, ' ');

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM