简体   繁体   English

Java UTF-8编码

[英]Java UTF-8 encoding

I have a String like this 我有这样的字符串

String str = "\u0e04\u0e38\u0e13\u0e23\u0e39\u0e49\u0e21\u0e31\u0e49\u0e22\u0e44\u0e14\u0e42\u0e19";

It actually looks like ช1: คุณรู้มั้ยไดโนเสาร์ตั 实际上看起来像ช1: คุณรู้มั้ยไดโนเสาร์ตั

What I want is to keep the string as a string format so that str.charAt(3) is 'e' rather than a strange character. 我想要的是将字符串保留为字符串格式,以便str.charAt(3)是'e'而不是一个奇怪的字符。

How to do this? 这个怎么做? Help 救命

Further explain: I get this string from a file. 进一步说明:我从文件获取此字符串。 I read a line in the file to a string, and this line appears to be "\ค\ุ\ณ\ร\ู\้\ม\ั\้\ย\ไ\ด\โ\น". 我在文件中的一行读取了一个字符串,该行似乎是“ \\ u0e04 \\ u0e38 \\ u0e13 \\ u0e23 \\ u0e39 \\ u0e49 \\ u0e21 \\ u0e31 \\ u0e49 \\ u0e22 \\ u0e44 \\ u0e14 \\ u0e42 \\ u0e19”。 So in memory, this string is like this. 因此在内存中,此字符串就是这样。

Code here: 代码在这里:

FileReader fr = new FileReader("sample2.json");
BufferedReader br = new BufferedReader(fr);

String line;
while((line = br.readLine()) != null)
{
    JSONObject data = new JSONObject(line);
        String text = data.getString("text");

This line in the file is "\ค\ุ\ณ\ร\ู\้\ม\ั\้\ย\ไ\ด\โ\น" 文件中的这一行是“ \\ u0e04 \\ u0e38 \\ u0e13 \\ u0e23 \\ u0e39 \\ u0e49 \\ u0e21 \\ u0e31 \\ u0e49 \\ u0e22 \\ u0e44 \\ u0e14 \\ u0e42 \\ u0e19”

Now I want to keep the string text as its original format. 现在,我想将字符串文本保留为其原始格式。

您只需要转义每个反斜杠:

String str = "\\u0e04\\u0e38...";

I guess you've read this string from a file or stream. 我想您已经从文件或流中读取了此字符串。 Seems you've read it using the wrong encoding (not the one the String was encoded with when it was written to that file/stream). 似乎您使用了错误的编码(而不是将String写入该文件/流时使用的String编码)来读取它。 That's why you get this issue, I think. 我想,这就是您遇到这个问题的原因。

We don't worry about encodings when Strings are in memory (in the memory of the JVM for example). 当字符串在内存中(例如在JVM的内存中)时,我们不必担心编码。 Encodings start to matter when you need to write your in-memory data/String to file/stream or to read it from file/stream. 当您需要将内存中的数据/字符串写入文件/流或从文件/流中读取时,编码就变得很重要。

Okay, this looks dumb, but it will work in your case: 好的,这看起来很愚蠢,但是可以解决您的问题:

Instead of: 代替:

JSONObject data = new JSONObject(line);

JSONObject data = new JSONObject(line.replaceAll("\\\\", "\\\\\\\\"));

The problem is that JSON converts your unicode chars for your 'convenience'. 问题在于JSON会为您的“方便”转换您的Unicode字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM