简体   繁体   English

从InputStream读取UTF-8编码的文本

[英]Reading UTF-8 encoded text from InputStream

I'm having problems reading all Japanese/Chinese characters from an input stream. 我在从输入流中读取所有日语/中文字符时遇到问题。

Basically, I'm retrieving a JSON object from an API. 基本上,我是从API检索JSON对象。

Below is my code: 下面是我的代码:

    try {
        URL url = new URL(string);
        BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream(),StandardCharsets.UTF_8));
        result = br.readLine();
        br.close();
    } catch(Exception e) {

} }

For some reason, not all characters are read by the input stream. 由于某些原因,输入流不会读取所有字符。 What could be the problem? 可能是什么问题呢?

To be specific, some characters appear when I print them out in the console, while some appear as black boxes with question marks. 具体来说,当我在控制台中将它们打印出来时,会出现一些字符,而另一些字符则显示为带有问号的黑框。 Also, there are no black boxes with questions marks when I check the actual JSON object through a browser. 另外,当我通过浏览器检查实际的JSON对象时,没有带有问号的黑匣子。

What you see when "printing to a console" really has nothing to do with whether data was read or not, but has everything to do with the capabilities of your console. “打印到控制台”时看到的内容实际上与是否读取数据无关,但与控制台的功能有关。

If you are fetching data from a URL, and you know for sure that the bytes you have fetched represent UTF-8 encoded text, and the entire data fits on one line of text, then there is no reason why your code should not work. 如果您要从URL提取数据,并且可以确定所提取的字节代表UTF-8编码的文本,并且整个数据都位于一行文本中,那么就没有理由不能使您的代码起作用。

It sounds like you are not sure things work because you are trying to print text to your console. 听起来您不确定是否可以正常工作,因为您正在尝试将文本打印到控制台。 Perhaps your console is not set to render UTF-8 encoded text? 也许您的控制台未设置为呈现UTF-8编码的文本? Perhaps your console font does not have enough glyphs to cover the font? 也许您的控制台字体没有足够的标志符号来覆盖该字体?

Here are two things you can try: 您可以尝试以下两种方法:

  1. Instead of writing the text to your console, save it to a file. 而不是将文本写入控制台,而是将其保存到文件中。 Then use a command like hexdump -C (on a *nix system, I have no idea how to do that in Windows) and look at the binary representation to make sure all your expected characters are there. 然后使用hexdump -C类的命令(在* nix系统上,我不知道如何在Windows中执行此操作),并查看二进制表示形式,以确保所有期望的字符都在那里。

  2. Save your data to a text file, then open it in a web browser, since browsers probably have much richer font support than a console. 将数据保存到文本文件,然后在Web浏览器中将其打开,因为浏览器可能比控制台具有更丰富的字体支持。

If you still suspect you've read the remote data incorrectly, you can run your retrieved text through a JSON validator, just to make sure. 如果仍然怀疑自己读取了错误的远程数据,则可以通过JSON验证器运行检索到的文本,以确保确定。

请尝试以下方法:“ ISO-8859-1”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM