简体   繁体   English

在Java中解压缩GZIPed HTTP响应

[英]Uncompress GZIPed HTTP Response in Java

I'm trying to uncompress a GZIPed HTTP Response by using GZIPInputStream . 我正在尝试使用GZIPInputStream解压缩GZIPed HTTP响应。 However I always have the same exception when I try to read the stream : java.util.zip.ZipException: invalid bit length repeat 但是,当我尝试读取流时,我总是有相同的异常: java.util.zip.ZipException: invalid bit length repeat

My HTTP Request Header: 我的HTTP请求标头:

GET www.myurl.com HTTP/1.0\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 115\r\n
Connection: keep-alive\r\n
X-Requested-With: XMLHttpRequest\r\n
Cookie: Some Cookies\r\n\r\n

At the end of the HTTP Response header, I get path=/Content-Encoding: gzip , followed by the gziped response. 在HTTP Response标头的末尾,我得到path=/Content-Encoding: gzip ,然后是gziped响应。

I tried 2 similars codes to uncompress : 我试过2个similars代码来解压缩:

UPDATE : In the following codes, tBytes = (the string after 'path=/Content-Encoding: gzip').getBytes (); 更新:在以下代码中, tBytes = (the string after 'path=/Content-Encoding: gzip').getBytes ();

GZIPInputStream  gzip = new GZIPInputStream (new ByteArrayInputStream (tBytes));

StringBuffer  szBuffer = new StringBuffer ();

byte  tByte [] = new byte [1024];

while (true)
{
    int  iLength = gzip.read (tByte, 0, 1024); // <-- Error comes here

    if (iLength < 0)
        break;

    szBuffer.append (new String (tByte, 0, iLength));
}

And this one that I get on this forum : 这是我在这个论坛上得到的一个:

InputStream     gzipStream = new GZIPInputStream   (new ByteArrayInputStream (tBytes));
Reader          decoder    = new InputStreamReader (gzipStream, "UTF-8");//<- I tried ISO-8859-1 and get the same exception
BufferedReader  buffered   = new BufferedReader    (decoder);

I guess this is an encoding error. 我猜这是一个编码错误。

Best regards, 最好的祝福,

bill0ute bill0ute

You don't show how you get the tBytes that you use to set up the gzip stream here: 您没有在tBytes显示如何获取用于设置gzip流的tBytes

GZIPInputStream  gzip = new GZIPInputStream (new ByteArrayInputStream (tBytes));

One explanation is that you are including the entire HTTP response in tBytes . 一种解释是您将整个HTTP响应包含在tBytes Instead, it should be only the content after the HTTP headers. 相反,它应该只是HTTP标头之后的内容。

Another explanation is that the response is chunked . 另一种解释是,响应是分块的

edit : You are taking the data after the content-encoding line as the message body. 编辑 :您将内容编码行之后的数据作为邮件正文。 However, according to the HTTP 1.1 specification the header fields do not come in any particular order, so this is very dangerous. 但是,根据HTTP 1.1规范,头字段没有任何特定顺序,因此这非常危险。

As explained in this part of the HTTP specification , the message body of a request or response doesn't come after a particular header field but after the first empty line : 正如HTTP规范的这一部分所解释的那样,请求或响应的消息体不是在特定的头字段之后,而是在第一个空行之后

Request (section 5) and Response (section 6) messages use the generic message format of RFC 822 [9] for transferring entities (the payload of the message). 请求(第5节)和响应(第6节)消息使用RFC 822 [9]的通用消息格式来传输实体(消息的有效载荷)。 Both types of message consist of a start-line, zero or more header fields (also known as "headers"), an empty line (ie, a line with nothing preceding the CRLF) indicating the end of the header fields, and possibly a message-body. 两种类型的消息都包括一个起始行,零个或多个标题字段(也称为“标题”),一个空行(即CRLF前面没有任何内容的行),表示标题字段的结尾,可能还有一个邮件正文。

You still haven't show how exactly you compose tBytes , but at this point I think you're erroneously including the empty line in the data that you try to decompress. 你仍然没有显示你是如何组成tBytes ,但是在这一点上我认为你错误地在你尝试解压缩的数据中包含空行。 The message body starts after the CRLF characters of the empty line. 消息正文在空行的CRLF字符后启动。

May I suggest that you use the httpclient library instead to extract the message body? 我可以建议您使用httpclient库来提取邮件正文吗?

Well there is the problem I can see here; 那么我可以在这里看到问题;

int  iLength = gzip.read (tByte, 0, 1024);

Use following to fix that; 使用以下来解决这个问题;

        byte[] buff = new byte[1024];
byte[] emptyBuff = new byte[1024];
                            StringBuffer unGzipRes = new StringBuffer();

                            int byteCount = 0;
                            while ((byteCount = gzip.read(buff, 0, 1024)) > 0) {
                                // only append the buff elements that
                                // contains data
                                unGzipRes.append(new String(Arrays.copyOf(
                                        buff, byteCount), "utf-8"));

                                // empty the buff for re-usability and
                                // prevent dirty data attached at the
                                // end of the buff
                                System.arraycopy(emptyBuff, 0, buff, 0,
                                        1024);
                            }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM