[英]Extract gzip content from raw http response
I try to do http(not https scheme, ie url is http://www.example.com
) get
simply by socket
module, then I recv
response which contains all tranferred data from server(header and body with gzip encoded).Then I try to extract gzipped body content. 我尝试通过
socket
模块简单地get
http(不是https方案,即url为http://www.example.com
),然后我recv
响应,其中包含来自服务器的所有传输数据(标头和主体均使用gzip编码)。我尝试提取压缩的身体内容。 I guess this content should start at \\x1f\\x8b\\x08
, but I don't know where it should end.Any help? 我想这个内容应该从
\\x1f\\x8b\\x08
,但我不知道它应该在哪里结束。
Below is my raw response 以下是我的原始回应
HTTP/1.1 200 OK\r\n
Header Part\r\n
\r\n
some_number_here\r\n
\x1f\x8b\x08 ......
......\r\n
0\r\n
\r\n
I bet that in the Header part you have an Transfer-Encoding: chunked
header. 我敢打赌,在Header部分中,您有一个
Transfer-Encoding: chunked
块头。
This is an HTTP/1.1
response, not an HTTP/1.0
, and understanding chunked transmission is required in the 1.1 version of HTTP. 这是
HTTP/1.1
响应,而不是HTTP/1.0
,并且在HTTP 1.1版本中需要了解分块传输。
You have two solutions: 您有两种解决方案:
HTTP/1.1
by using HTTP/1.0
in your requests, on the first line, like in GET /foo HTTP/1.0
HTTP/1.0
告诉服务器您不理解HTTP/1.1
,就像在GET /foo HTTP/1.0
The parsing is not so hard. 解析不是那么困难。 Instead of a raw body you have a body splitted in parts (chunks);
您可以将身体分为多个部分(块),而不是原始的身体。 each part start with the chunk size (the
some_number_here\\r\\n
stuff), it's an hexadecimal number(warning 10
means 16
, 1c
means 28). 每个部分的开始与所述块大小(
some_number_here\\r\\n
的东西),它是一个十六进制数(警报10
装置16
, 1c
装置28)。
Then you have the raw chunk content. 然后,您获得了原始块内容。
Then the next chunk. 然后下一块。
Until you reach the last chunk, which is advertized with a 0 size ( 0\\r\\n\\r\\n
). 直到到达最后一个块,该块将以0大小(
0\\r\\n\\r\\n
)进行广告。
Warning: the server may take some time between chunks, you have to keep reading the socket until you see this last chunk. 警告:服务器在块之间可能要花费一些时间,您必须继续读取套接字,直到看到最后一个块。
PS: do not try to implement HTTP with sockets for something that would go into production later, there are a lot of HTTP clients available, even in python, and it's a very huge job to get something secure and robust. PS:不要尝试用套接字实现HTTP,以免日后投入生产,有很多HTTP客户端可用,甚至在python中也是如此,要获得安全可靠的功能是一项非常艰巨的工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.