简体   繁体   English

如何在python中解码使用gzip压缩的源代码

[英]How to decode a source code which is compressed with gzip in python

I am trying to get the source code of a php web page with a proxy, but it is showing not printable characters. 我正在尝试通过代理获取php网页的源代码,但是它显示的不是可打印字符。 The output I got is as follows: 我得到的输出如下:

"Date: Tue, 09 Feb 2016 10:29:14 GMT
Server: Apache/2.4.9 (Unix) OpenSSL/1.0.1g PHP/5.5.11 mod_perl/2.0.8-dev Perl/v5.16.3
X-Powered-By: PHP/5.5.11
Set-Cookie: PHPSESSID=jmqasueos33vqoe6dbm3iscvg0; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 577
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Content-Type: text/html

�TMo�@�G����7�)P�H�H�DS��=U�=�U�]˻��_�Ycl�T�*�>��eg��
                                                          ����Z�
                                                                �V�N�f�:6�ԫ�IkZ77�A��nG�W��ɗ���RGY��Oc`-ο�ƜO��~?�V��$�
                            �l4�+���n�].W��TLJSx�/|�n��#���>��r����;�l����H��4��f�\  �SY�y��7��"

how to decode this code using python, i tried to use 如何使用python解码此代码,我尝试使用

decd=zlib.decompress(data, 16+zlib.MAX_WBITS)

but is not giving the decoded data 但没有给出解码后的数据

The proxy which i am using is working fine for few other web applications. 我正在使用的代理对于其他一些Web应用程序也能正常工作。 It showing non printable characters for some web applications, how to decode this? 它显示了某些Web应用程序的不可打印字符,该如何解码?

As I am using proxy I dont want to use get() and urlopen() or any other requests from python program. 当我使用代理服务器时,我不想使用get()和urlopen()或来自python程序的任何其他请求。

One obvious way to do this is to extract the compressed data from the response and decompress it using GzipFile().read() . 一种明显的方法是从响应中提取压缩数据,然后使用GzipFile().read()其解压缩。 This method of splitting the response might be prone to failure, but here it goes: 这种拆分响应的方法可能易于失败,但是可以这样:

from gzip import GzipFile
from StringIO import StringIO

http = 'HTTP/1.1 200 OK\r\nServer: nginx\r\nDate: Tue, 09 Feb 2016 12:02:25 GMT\r\nContent-Type: application/json\r\nContent-Length: 115\r\nConnection: close\r\nContent-Encoding: gzip\r\nAccess-Control-Allow-Origin: *\r\nAccess-Control-Allow-Credentials: true\r\n\r\n\x1f\x8b\x08\x00\xa0\xda\xb9V\x02\xff\xab\xe6RPPJ\xaf\xca,(HMQ\xb2R()*M\xd5Q\x00\x89e\xa4&\xa6\xa4\x16\x15\x03\xc5\xaa\x81\\\xa0\x80G~q\t\x90\xa7\x94QRR\x90\x94\x99\xa7\x97_\x94\xae\x04\x94\xa9\x85(\xcfM-\xc9\xc8\x07\x99\xa0\xe4\xee\x1a\xa2\x04\x11\xcb/\xcaL\xcf\xcc\x03\x89\x19Z\x1a\xe9\x19\x9aY\xe8\x19\xea\x19*q\xd5r\x01\x00\r(\xafRu\x00\x00\x00'

body = http.split('\r\n\r\n', 1)[1]
print GzipFile(fileobj=StringIO(body)).read()

Output 产量

{
  "gzipped": true, 
  "headers": {
    "Host": "httpbin.org"
  }, 
  "method": "GET", 
  "origin": "192.168.1.1"
}

If you feel compelled to parse the full HTTP response message, then, as inspired by this answer , here is a rather roundabout way to do it which involves constructing a httplib.HTTPResponse directly from the raw HTTP response, using that to create a urllib3.response.HTTPResponse , and then accessing the decompressed data: 如果您被迫解析完整的HTTP响应消息,那么,受此答案的启发, 是一种相当httplib.HTTPResponse方法,它涉及直接从原始HTTP响应构造一个httplib.HTTPResponse ,并使用该方法创建urllib3.response.HTTPResponse ,然后访问解压缩的数据:

import httplib
from cStringIO import StringIO
from urllib3.response import HTTPResponse

http = 'HTTP/1.1 200 OK\r\nServer: nginx\r\nDate: Tue, 09 Feb 2016 12:02:25 GMT\r\nContent-Type: application/json\r\nContent-Length: 115\r\nConnection: close\r\nContent-Encoding: gzip\r\nAccess-Control-Allow-Origin: *\r\nAccess-Control-Allow-Credentials: true\r\n\r\n\x1f\x8b\x08\x00\xa0\xda\xb9V\x02\xff\xab\xe6RPPJ\xaf\xca,(HMQ\xb2R()*M\xd5Q\x00\x89e\xa4&\xa6\xa4\x16\x15\x03\xc5\xaa\x81\\\xa0\x80G~q\t\x90\xa7\x94QRR\x90\x94\x99\xa7\x97_\x94\xae\x04\x94\xa9\x85(\xcfM-\xc9\xc8\x07\x99\xa0\xe4\xee\x1a\xa2\x04\x11\xcb/\xcaL\xcf\xcc\x03\x89\x19Z\x1a\xe9\x19\x9aY\xe8\x19\xea\x19*q\xd5r\x01\x00\r(\xafRu\x00\x00\x00'

class DummySocket(object):
    def __init__(self, data):
        self._data = StringIO(data)
    def makefile(self, *args, **kwargs):
        return self._data

response = httplib.HTTPResponse(DummySocket(http))
response.begin()
response = HTTPResponse.from_httplib(response)
print(response.data)

Output 产量

{
  "gzipped": true, 
  "headers": {
    "Host": "httpbin.org"
  }, 
  "method": "GET", 
  "origin": "192.168.1.1"
}

Although gzip uses zlib , when Content-Encoding is set to gzip , there is an additional header before the compressed stream which is not properly interpreted by the zlib.decompress call. 尽管gzip使用zlib ,但是当Content-Encoding设置为gzip ,压缩流之前还有一个附加头,而zlib.decompress调用无法正确解释该头。

Put your data in a file-like object and pass it through the gzip module. 将数据放在类似file-like对象中,然后通过gzip模块传递。 For example something like: 例如:

mydatafile = cStringIO.StringIO(data)
gzipper = gzip.GzipFile(fileobj=mydatafile)
decdata = gzipper.read()

From my already old http library for Python 2.x 来自我已经很旧的Python 2.x的http库

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM