从原始http响应中提取gzip内容

Question

I try to do http(not https scheme, ie url is http://www.example.com ) get simply by socket module, then I recv response which contains all tranferred data from server(header and body with gzip encoded).Then I try to extract gzipped body content. 我尝试通过socket模块简单地get http（不是https方案，即url为http://www.example.com ），然后我recv响应，其中包含来自服务器的所有传输数据（标头和主体均使用gzip编码）。我尝试提取压缩的身体内容。 I guess this content should start at \\x1f\\x8b\\x08 , but I don't know where it should end.Any help? 我想这个内容应该从\\x1f\\x8b\\x08 ，但我不知道它应该在哪里结束。

Below is my raw response 以下是我的原始回应

HTTP/1.1 200 OK\r\n
Header Part\r\n
\r\n
some_number_here\r\n
\x1f\x8b\x08 ......
......\r\n
0\r\n
\r\n

Answer 1

I bet that in the Header part you have an Transfer-Encoding: chunked header. 我敢打赌，在Header部分中，您有一个Transfer-Encoding: chunked块头。

This is an HTTP/1.1 response, not an HTTP/1.0 , and understanding chunked transmission is required in the 1.1 version of HTTP. 这是HTTP/1.1响应，而不是HTTP/1.0 ，并且在HTTP 1.1版本中需要了解分块传输。

You have two solutions: 您有两种解决方案：

tell the server you do not understand HTTP/1.1 by using HTTP/1.0 in your requests, on the first line, like in GET /foo HTTP/1.0 通过在第一行中使用HTTP/1.0告诉服务器您不理解HTTP/1.1 ，就像在GET /foo HTTP/1.0
- implement the chunked transmission parsing. 实现分块传输解析。

The parsing is not so hard. 解析不是那么困难。 Instead of a raw body you have a body splitted in parts (chunks); 您可以将身体分为多个部分（块），而不是原始的身体。 each part start with the chunk size (the some_number_here\\r\\n stuff), it's an hexadecimal number(warning 10 means 16 , 1c means 28). 每个部分的开始与所述块大小（ some_number_here\\r\\n的东西），它是一个十六进制数（警报10装置16 ， 1c装置28）。

Then you have the raw chunk content. 然后，您获得了原始块内容。

Then the next chunk. 然后下一块。

Until you reach the last chunk, which is advertized with a 0 size ( 0\\r\\n\\r\\n ). 直到到达最后一个块，该块将以0大小（ 0\\r\\n\\r\\n ）进行广告。

Warning: the server may take some time between chunks, you have to keep reading the socket until you see this last chunk. 警告：服务器在块之间可能要花费一些时间，您必须继续读取套接字，直到看到最后一个块。

PS: do not try to implement HTTP with sockets for something that would go into production later, there are a lot of HTTP clients available, even in python, and it's a very huge job to get something secure and robust. PS：不要尝试用套接字实现HTTP，以免日后投入生产，有很多HTTP客户端可用，甚至在python中也是如此，要获得安全可靠的功能是一项非常艰巨的工作。

从原始http响应中提取gzip内容

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-08-26 14:12:31

从原始http响应中提取gzip内容

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-08-26 14:12:31

解决方案1
0 已采纳 2019-08-26 14:12:31