简体   繁体   English

从原始http响应中提取gzip内容

[英]Extract gzip content from raw http response

I try to do http(not https scheme, ie url is http://www.example.com ) get simply by socket module, then I recv response which contains all tranferred data from server(header and body with gzip encoded).Then I try to extract gzipped body content. 我尝试通过socket模块简单地get http(不是https方案,即url为http://www.example.com ),然后我recv响应,其中包含来自服务器的所有传输数据(标头和主体均使用gzip编码)。我尝试提取压缩的身体内容。 I guess this content should start at \\x1f\\x8b\\x08 , but I don't know where it should end.Any help? 我想这个内容应该从\\x1f\\x8b\\x08 ,但我不知道它应该在哪里结束。

Below is my raw response 以下是我的原始回应

HTTP/1.1 200 OK\r\n
Header Part\r\n
\r\n
some_number_here\r\n
\x1f\x8b\x08 ......
......\r\n
0\r\n
\r\n

I bet that in the Header part you have an Transfer-Encoding: chunked header. 我敢打赌,在Header部分中,您有一个Transfer-Encoding: chunked块头。

This is an HTTP/1.1 response, not an HTTP/1.0 , and understanding chunked transmission is required in the 1.1 version of HTTP. 这是HTTP/1.1响应,而不是HTTP/1.0 ,并且在HTTP 1.1版本中需要了解分块传输。

You have two solutions: 您有两种解决方案:

  • tell the server you do not understand HTTP/1.1 by using HTTP/1.0 in your requests, on the first line, like in GET /foo HTTP/1.0 通过在第一行中使用HTTP/1.0告诉服务器您不理解HTTP/1.1 ,就像在GET /foo HTTP/1.0
    • implement the chunked transmission parsing. 实现分块传输解析。

The parsing is not so hard. 解析不是那么困难。 Instead of a raw body you have a body splitted in parts (chunks); 您可以将身体分为多个部分(块),而不是原始的身体。 each part start with the chunk size (the some_number_here\\r\\n stuff), it's an hexadecimal number(warning 10 means 16 , 1c means 28). 每个部分的开始与所述块大小( some_number_here\\r\\n的东西),它是一个十六进制数(警报10装置161c装置28)。

Then you have the raw chunk content. 然后,您获得了原始块内容。

Then the next chunk. 然后下一块。

Until you reach the last chunk, which is advertized with a 0 size ( 0\\r\\n\\r\\n ). 直到到达最后一个块,该块将以0大小( 0\\r\\n\\r\\n )进行广告。

Warning: the server may take some time between chunks, you have to keep reading the socket until you see this last chunk. 警告:服务器在块之间可能要花费一些时间,您必须继续读取套接字,直到看到最后一个块。

PS: do not try to implement HTTP with sockets for something that would go into production later, there are a lot of HTTP clients available, even in python, and it's a very huge job to get something secure and robust. PS:不要尝试用套接字实现HTTP,以免日后投入生产,有很多HTTP客户端可用,甚至在python中也是如此,要获得安全可靠的功能是一项非常艰巨的工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM