简体   繁体   English

从InputStreamReader提取文本在UTF-8中不起作用

[英]Extracting text from InputStreamReader not working in UTF-8

I'm trying to read the following API text page: 我正在尝试阅读以下API文本页面:

https://api.stackexchange.com/2.2/users?page=1&pagesize=9&fromdate=1221436800&todate=1523318400&order=desc&min=1&max=2000000&sort=reputation&site=stackoverflow https://api.stackexchange.com/2.2/users?page=1&pagesize=9&fromdate=1221436800&todate=1523318400&order=desc&min=1&max=2000000&sort=reputation&site=stackoverflow

using InputStreamReader and I want to extract the text and print it line by line. 使用InputStreamReader,我想提取文本并逐行打印。

The issue is that the format of the text is not recognized as UTF-8. 问题是文本格式未被识别为UTF-8。 So the output looks ugly like: ???? 所以输出看起来很丑:

The code of the method is the following: 该方法的代码如下:

String testURL = "https://api.stackexchange.com/2.2/users?page=1&pagesize=9&fromdate=1221436800&todate=1523318400&order=desc&min=1&max=2000000&sort=reputation&site=stackoverflow";

            URL url = null;
            try
            {
                url = new URL(testURL);
            } catch (MalformedURLException e1)
            {
                e1.printStackTrace();
            }

            InputStream is = null;

            try
            {
                is = url.openStream();
            } catch (IOException e1)
            {
                e1.printStackTrace();
            }


            try (BufferedReader br = new BufferedReader(new InputStreamReader(is, "ISO-8859-1")))
            {
                String line;

                while ((line = br.readLine()) != null)
                {
                    System.out.println(line);
                }

            } catch (MalformedURLException e)
            {
                e.printStackTrace();

            } catch (IOException e)
            {
                e.printStackTrace();

            }

I've tried changing the line 我试过换线

try (BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8")))

to

try (BufferedReader br = new BufferedReader(new InputStreamReader(is, StandardCharsets.UTF_8)))

or to 或者

try (BufferedReader br = new BufferedReader(new InputStreamReader(is, "ISO-8859-1")))

Unfortunately, the issue still persists. 不幸的是,问题仍然存在。 I would really appreciate any tips so I can solve this problem. 我真的很感谢任何提示,以便我可以解决此问题。 Thank you. 谢谢。

To analyse your problem I tried to download from the given URL by curl (with option -i to see the HTTP response header lines) and got: 为了分析您的问题,我尝试通过curl从给定的URL下载(带有-i选项以查看HTTP响应标题行),并得到:

Cache-Control: private
Content-Type: application/json; charset=utf-8
Content-Encoding: gzip
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST
Access-Control-Allow-Credentials: false
X-Content-Type-Options: nosniff
Date: Sat, 21 Apr 2018 21:48:42 GMT
Content-Length: 85

▒VJ-*▒/▒▒LQ▒210ЁrsS▒▒▒S▒▒▒▒3KR2▒▒R
 K3▒RS▒`J▒sA▒I▒)▒▒E@NIj▒R-g▒▒PP^C

The line Content-Encoding: gzip tells you that the content is gzip-compressed. Content-Encoding: gzip告诉您内容是gzip压缩的。

Hence, in your Java program you need to gzip-uncompress the contents. 因此,在Java程序中,您需要gzip解压缩内容。
You can do this simply by replacing the line 您只需替换行即可

is = url.openStream();

with

is = new GZIPInputStream(url.openStream());

An even better approach would be to get the actual Content-Encoding and depending on that decide if you need to decompress the content: 更好的方法是获取实际的Content-Encoding,并根据此决定是否解压缩内容:

URLConnection connection = url.openConnection();
is = connection.getInputStream();
String contentEncoding = connection.getContentEncoding();
if (contentEncoding.equals("gzip"))
    is = new GZIPInputStream(is);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM