简体   繁体   English

记录来自 python-requests 模块的所有请求

[英]Log all requests from the python-requests module

I am using python Requests .我正在使用 python请求 I need to debug some OAuth activity, and for that I would like it to log all requests being performed.我需要调试一些OAuth活动,为此我希望它记录正在执行的所有请求。 I could get this information with ngrep , but unfortunately it is not possible to grep https connections (which are needed for OAuth )我可以使用ngrep获取此信息,但不幸的是,无法 grep https 连接( OAuth需要这些连接)

How can I activate logging of all URLs (+ parameters) that Requests is accessing?如何激活Requests正在访问的所有 URL(+ 参数)的日志记录?

You need to enable debugging at httplib level ( requestsurllib3httplib ).您需要在httplib级别启用调试( requestsurllib3httplib )。

Here's some functions to both toggle ( ..._on() and ..._off() ) or temporarily have it on:这里有一些功能可以同时切换( ..._on()..._off() )或暂时打开它:

import logging
import contextlib
try:
    from http.client import HTTPConnection # py3
except ImportError:
    from httplib import HTTPConnection # py2

def debug_requests_on():
    '''Switches on logging of the requests module.'''
    HTTPConnection.debuglevel = 1

    logging.basicConfig()
    logging.getLogger().setLevel(logging.DEBUG)
    requests_log = logging.getLogger("requests.packages.urllib3")
    requests_log.setLevel(logging.DEBUG)
    requests_log.propagate = True

def debug_requests_off():
    '''Switches off logging of the requests module, might be some side-effects'''
    HTTPConnection.debuglevel = 0

    root_logger = logging.getLogger()
    root_logger.setLevel(logging.WARNING)
    root_logger.handlers = []
    requests_log = logging.getLogger("requests.packages.urllib3")
    requests_log.setLevel(logging.WARNING)
    requests_log.propagate = False

@contextlib.contextmanager
def debug_requests():
    '''Use with 'with'!'''
    debug_requests_on()
    yield
    debug_requests_off()

Demo use:演示使用:

>>> requests.get('http://httpbin.org/')
<Response [200]>

>>> debug_requests_on()
>>> requests.get('http://httpbin.org/')
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 12150
send: 'GET / HTTP/1.1\r\nHost: httpbin.org\r\nConnection: keep-alive\r\nAccept-
Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: python-requests/2.11.1\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: nginx
...
<Response [200]>

>>> debug_requests_off()
>>> requests.get('http://httpbin.org/')
<Response [200]>

>>> with debug_requests():
...     requests.get('http://httpbin.org/')
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org
...
<Response [200]>

You will see the REQUEST, including HEADERS and DATA, and RESPONSE with HEADERS but without DATA.您将看到 REQUEST,包括 HEADERS 和 DATA,以及 RESPONSE with HEADERS but without DATA。 The only thing missing will be the response.body which is not logged.唯一缺少的是没有记录的 response.body 。

Source来源

The underlying urllib3 library logs all new connections and URLs with the logging module , but not POST bodies.底层urllib3库使用logging模块记录所有新连接和 URL,但不logging POST主体。 For GET requests this should be enough:对于GET请求,这应该足够了:

import logging

logging.basicConfig(level=logging.DEBUG)

which gives you the most verbose logging option;它为您提供了最详细的日志记录选项; see the logging HOWTO for more details on how to configure logging levels and destinations.有关如何配置日志记录级别和目标的更多详细信息,请参阅日志记录 HOWTO

Short demo:简短演示:

>>> import requests
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> r = requests.get('http://httpbin.org/get?foo=bar&baz=python')
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org:80
DEBUG:urllib3.connectionpool:http://httpbin.org:80 "GET /get?foo=bar&baz=python HTTP/1.1" 200 366

Depending on the exact version of urllib3, the following messages are logged:根据 urllib3 的确切版本,将记录以下消息:

  • INFO : Redirects INFO :重定向
  • WARN : Connection pool full (if this happens often increase the connection pool size) WARN :连接池已满(如果发生这种情况,通常会增加连接池大小)
  • WARN : Failed to parse headers (response headers with invalid format) WARN :无法解析标头(格式无效的响应标头)
  • WARN : Retrying the connection WARN :重试连接
  • WARN : Certificate did not match expected hostname WARN :证书与预期的主机名不匹配
  • WARN : Received response with both Content-Length and Transfer-Encoding, when processing a chunked response WARN :在处理分块响应时接收到包含 Content-Length 和 Transfer-Encoding 的响应
  • DEBUG : New connections (HTTP or HTTPS) DEBUG :新连接(HTTP 或 HTTPS)
  • DEBUG : Dropped connections DEBUG :断开的连接
  • DEBUG : Connection details: method, path, HTTP version, status code and response length DEBUG : 连接详细信息:方法、路径、HTTP 版本、状态码和响应长度
  • DEBUG : Retry count increments DEBUG :重试计数递增

This doesn't include headers or bodies.这不包括标题或正文。 urllib3 uses the http.client.HTTPConnection class to do the grunt-work, but that class doesn't support logging, it can normally only be configured to print to stdout. urllib3使用http.client.HTTPConnection类来完成繁重的工作,但该类不支持日志记录,它通常只能配置为打印到标准输出。 However, you can rig it to send all debug information to logging instead by introducing an alternative print name into that module:但是,您可以通过在该模块中引入替代print名称来操纵它以将所有调试信息发送到日志记录:

import logging
import http.client

httpclient_logger = logging.getLogger("http.client")

def httpclient_logging_patch(level=logging.DEBUG):
    """Enable HTTPConnection debug logging to the logging framework"""

    def httpclient_log(*args):
        httpclient_logger.log(level, " ".join(args))

    # mask the print() built-in in the http.client module to use
    # logging instead
    http.client.print = httpclient_log
    # enable debugging
    http.client.HTTPConnection.debuglevel = 1

Calling httpclient_logging_patch() causes http.client connections to output all debug information to a standard logger, and so are picked up by logging.basicConfig() :调用httpclient_logging_patch()会导致http.client连接将所有调试信息输出到标准记录器,因此被logging.basicConfig()

>>> httpclient_logging_patch()
>>> r = requests.get('http://httpbin.org/get?foo=bar&baz=python')
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org:80
DEBUG:http.client:send: b'GET /get?foo=bar&baz=python HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
DEBUG:http.client:reply: 'HTTP/1.1 200 OK\r\n'
DEBUG:http.client:header: Date: Tue, 04 Feb 2020 13:36:53 GMT
DEBUG:http.client:header: Content-Type: application/json
DEBUG:http.client:header: Content-Length: 366
DEBUG:http.client:header: Connection: keep-alive
DEBUG:http.client:header: Server: gunicorn/19.9.0
DEBUG:http.client:header: Access-Control-Allow-Origin: *
DEBUG:http.client:header: Access-Control-Allow-Credentials: true
DEBUG:urllib3.connectionpool:http://httpbin.org:80 "GET /get?foo=bar&baz=python HTTP/1.1" 200 366

For those using python 3+对于那些使用 python 3+

import requests
import logging
import http.client

http.client.HTTPConnection.debuglevel = 1

logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True

When trying to get the Python logging system ( import logging ) to emit low level debug log messages, it suprised me to discover that given:当试图让 Python 日志系统( import logging )发出低级调试日志消息时,我惊讶地发现:

requests --> urllib3 --> http.client.HTTPConnection

that only urllib3 actually uses the Python logging system:只有urllib3实际使用 Python logging系统:

  • requests no requests没有
  • http.client.HTTPConnection no http.client.HTTPConnection没有
  • urllib3 yes urllib3

Sure, you can extract debug messages from HTTPConnection by setting:当然,您可以通过设置从HTTPConnection提取调试消息:

HTTPConnection.debuglevel = 1

but these outputs are merely emitted via the print statement.但这些输出仅通过print语句发出。 To prove this, simply grep the Python 3.7 client.py source code and view the print statements yourself (thanks @Yohann):为了证明这一点,只需 grep Python 3.7 client.py源代码并自己查看打印语句(感谢@Yohann):

curl https://raw.githubusercontent.com/python/cpython/3.7/Lib/http/client.py |grep -A1 debuglevel` 

Presumably redirecting stdout in some way might work to shoe-horn stdout into the logging system and potentially capture to eg a log file.据推测,以某种方式重定向标准输出可能会将标准输出硬塞到日志系统中,并可能捕获到例如日志文件。

Choose the ' urllib3 ' logger not ' requests.packages.urllib3 '选择“ urllib3 ”记录器而不是“ requests.packages.urllib3

To capture urllib3 debug information through the Python 3 logging system, contrary to much advice on the internet, and as @MikeSmith points out, you won't have much luck intercepting:要通过 Python 3 logging系统捕获urllib3调试信息,与互联网上的许多建议相反,正如@MikeSmith 指出的那样,您将不会有太多运气拦截:

log = logging.getLogger('requests.packages.urllib3')

instead you need to:相反,您需要:

log = logging.getLogger('urllib3')

Debugging urllib3 to a log fileurllib3调试到日志文件

Here is some code which logs urllib3 workings to a log file using the Python logging system:下面是一些使用 Python logging系统将urllib3工作记录到日志文件的代码:

import requests
import logging
from http.client import HTTPConnection  # py3

# log = logging.getLogger('requests.packages.urllib3')  # useless
log = logging.getLogger('urllib3')  # works

log.setLevel(logging.DEBUG)  # needed
fh = logging.FileHandler("requests.log")
log.addHandler(fh)

requests.get('http://httpbin.org/')

the result:结果:

Starting new HTTP connection (1): httpbin.org:80
http://httpbin.org:80 "GET / HTTP/1.1" 200 3168

Enabling the HTTPConnection.debuglevel print() statements启用HTTPConnection.debuglevel print() 语句

If you set HTTPConnection.debuglevel = 1如果设置HTTPConnection.debuglevel = 1

from http.client import HTTPConnection  # py3
HTTPConnection.debuglevel = 1
requests.get('http://httpbin.org/')

you'll get the print statement output of additional juicy low level info:您将获得额外多汁低级信息的打印语句输出:

send: b'GET / HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python- 
requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Access-Control-Allow-Credentials header: Access-Control-Allow-Origin 
header: Content-Encoding header: Content-Type header: Date header: ...

Remember this output uses print and not the Python logging system, and thus cannot be captured using a traditional logging stream or file handler (though it may be possible to capture output to a file by redirecting stdout) .请记住,此输出使用print而不是 Python logging系统,因此无法使用传统的logging流或文件处理程序捕获(尽管可以通过重定向 stdout 将输出捕获到文件)

Combine the two above - maximise all possible logging to console结合以上两者 - 最大化所有可能的日志记录到控制台

To maximise all possible logging, you must settle for console/stdout output with this:为了最大化所有可能的日志记录,您必须使用以下命令解决控制台/标准输出:

import requests
import logging
from http.client import HTTPConnection  # py3

log = logging.getLogger('urllib3')
log.setLevel(logging.DEBUG)

# logging from urllib3 to console
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
log.addHandler(ch)

# print statements from `http.client.HTTPConnection` to console/stdout
HTTPConnection.debuglevel = 1

requests.get('http://httpbin.org/')

giving the full range of output:提供全方位的输出:

Starting new HTTP connection (1): httpbin.org:80
send: b'GET / HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
http://httpbin.org:80 "GET / HTTP/1.1" 200 3168
header: Access-Control-Allow-Credentials header: Access-Control-Allow-Origin 
header: Content-Encoding header: ...

Having a script or even a subsystem of an application for a network protocol debugging, it's desired to see what request-response pairs are exactly, including effective URLs, headers, payloads and the status.拥有用于网络协议调试的脚本甚至应用程序子系统,需要查看请求-响应对的确切含义,包括有效的 URL、标头、有效负载和状态。 And it's typically impractical to instrument individual requests all over the place.并且到处检测个人请求通常是不切实际的。 At the same time there are performance considerations that suggest using single (or few specialised) requests.Session , so the following assumes that the suggestion is followed.同时,还有一些性能考虑建议使用单个(或几个专门的) requests.Session ,因此以下假设遵循该建议

requests supports so called event hooks (as of 2.23 there's actually only response hook). requests支持所谓的事件挂钩(从 2.23 开始,实际上只有response挂钩)。 It's basically an event listener, and the event is emitted before returning control from requests.request .它基本上是一个事件侦听器,在从requests.request返回控制之前发出事件。 At this moment both request and response are fully defined, hence can be logged.此时请求和响应都已完全定义,因此可以记录。

import logging

import requests


logger = logging.getLogger('httplogger')

def logRoundtrip(response, *args, **kwargs):
    extra = {'req': response.request, 'res': response}
    logger.debug('HTTP roundtrip', extra=extra)

session = requests.Session()
session.hooks['response'].append(logRoundtrip)

That's basically how to log all HTTP round-trips of a session.这基本上是如何记录会话的所有 HTTP 往返。

Formatting HTTP round-trip log records格式化 HTTP 往返日志记录

For the logging above to be useful there can be specialised logging formatter that understands req and res extras on logging records.为了使上面的日志记录有用,可以有专门的日志格式化程序来理解日志记录上的reqres附加项。 It can look like this:它看起来像这样:

import textwrap

class HttpFormatter(logging.Formatter):   

    def _formatHeaders(self, d):
        return '\n'.join(f'{k}: {v}' for k, v in d.items())

    def formatMessage(self, record):
        result = super().formatMessage(record)
        if record.name == 'httplogger':
            result += textwrap.dedent('''
                ---------------- request ----------------
                {req.method} {req.url}
                {reqhdrs}

                {req.body}
                ---------------- response ----------------
                {res.status_code} {res.reason} {res.url}
                {reshdrs}

                {res.text}
            ''').format(
                req=record.req,
                res=record.res,
                reqhdrs=self._formatHeaders(record.req.headers),
                reshdrs=self._formatHeaders(record.res.headers),
            )

        return result

formatter = HttpFormatter('{asctime} {levelname} {name} {message}', style='{')
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logging.basicConfig(level=logging.DEBUG, handlers=[handler])

Now if you do some requests using the session , like:现在,如果您使用session执行一些请求,例如:

session.get('https://httpbin.org/user-agent')
session.get('https://httpbin.org/status/200')

The output to stderr will look as follows. stderr的输出如下所示。

2020-05-14 22:10:13,224 DEBUG urllib3.connectionpool Starting new HTTPS connection (1): httpbin.org:443
2020-05-14 22:10:13,695 DEBUG urllib3.connectionpool https://httpbin.org:443 "GET /user-agent HTTP/1.1" 200 45
2020-05-14 22:10:13,698 DEBUG httplogger HTTP roundtrip
---------------- request ----------------
GET https://httpbin.org/user-agent
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive

None
---------------- response ----------------
200 OK https://httpbin.org/user-agent
Date: Thu, 14 May 2020 20:10:13 GMT
Content-Type: application/json
Content-Length: 45
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

{
  "user-agent": "python-requests/2.23.0"
}


2020-05-14 22:10:13,814 DEBUG urllib3.connectionpool https://httpbin.org:443 "GET /status/200 HTTP/1.1" 200 0
2020-05-14 22:10:13,818 DEBUG httplogger HTTP roundtrip
---------------- request ----------------
GET https://httpbin.org/status/200
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive

None
---------------- response ----------------
200 OK https://httpbin.org/status/200
Date: Thu, 14 May 2020 20:10:13 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 0
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

A GUI way GUI方式

When you have a lot of queries, having a simple UI and a way to filter records comes at handy.当您有大量查询时,拥有一个简单的 UI 和一种过滤记录的方法就派上用场了。 I'll show to use Chronologer for that (which I'm the author of).我将展示如何使用Chronologer (我是其作者)。

First, the hook has be rewritten to produce records that logging can serialise when sending over the wire.首先,钩子已被重写以生成记录,当通过线路发送时, logging可以序列化。 It can look like this:它看起来像这样:

def logRoundtrip(response, *args, **kwargs): 
    extra = {
        'req': {
            'method': response.request.method,
            'url': response.request.url,
            'headers': response.request.headers,
            'body': response.request.body,
        }, 
        'res': {
            'code': response.status_code,
            'reason': response.reason,
            'url': response.url,
            'headers': response.headers,
            'body': response.text
        },
    }
    logger.debug('HTTP roundtrip', extra=extra)

session = requests.Session()
session.hooks['response'].append(logRoundtrip)

Second, logging configuration has to be adapted to use logging.handlers.HTTPHandler (which Chronologer understands).其次,日志配置必须适应使用logging.handlers.HTTPHandler (Chronologer 理解)。

import logging.handlers

chrono = logging.handlers.HTTPHandler(
  'localhost:8080', '/api/v1/record', 'POST', credentials=('logger', ''))
handlers = [logging.StreamHandler(), chrono]
logging.basicConfig(level=logging.DEBUG, handlers=handlers)

Finally, run Chronologer instance.最后,运行 Chronologer 实例。 eg using Docker:例如使用 Docker:

docker run --rm -it -p 8080:8080 -v /tmp/db \
    -e CHRONOLOGER_STORAGE_DSN=sqlite:////tmp/db/chrono.sqlite \
    -e CHRONOLOGER_SECRET=example \
    -e CHRONOLOGER_ROLES="basic-reader query-reader writer" \
    saaj/chronologer \
    python -m chronologer -e production serve -u www-data -g www-data -m

And run the requests again:并再次运行请求:

session.get('https://httpbin.org/user-agent')
session.get('https://httpbin.org/status/200')

The stream handler will produce:流处理程序将产生:

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): httpbin.org:443
DEBUG:urllib3.connectionpool:https://httpbin.org:443 "GET /user-agent HTTP/1.1" 200 45
DEBUG:httplogger:HTTP roundtrip
DEBUG:urllib3.connectionpool:https://httpbin.org:443 "GET /status/200 HTTP/1.1" 200 0
DEBUG:httplogger:HTTP roundtrip

Now if you open http://localhost:8080/ (use "logger" for username and empty password for the basic auth popup) and click "Open" button, you should see something like:现在,如果您打开http://localhost:8080/ (用户名使用“logger”,基本身份验证弹出窗口使用空密码)并单击“打开”按钮,您应该看到如下内容:

计时器的屏幕截图

Just improving this answer只是改进这个答案

This is how it worked for me:这对我来说是这样的:

import logging
import sys    
import requests
import textwrap
    
root = logging.getLogger('httplogger')


def logRoundtrip(response, *args, **kwargs):
    extra = {'req': response.request, 'res': response}
    root.debug('HTTP roundtrip', extra=extra)
    

class HttpFormatter(logging.Formatter):

    def _formatHeaders(self, d):
        return '\n'.join(f'{k}: {v}' for k, v in d.items())

    def formatMessage(self, record):
        result = super().formatMessage(record)
        if record.name == 'httplogger':
            result += textwrap.dedent('''
                ---------------- request ----------------
                {req.method} {req.url}
                {reqhdrs}

                {req.body}
                ---------------- response ----------------
                {res.status_code} {res.reason} {res.url}
                {reshdrs}

                {res.text}
            ''').format(
                req=record.req,
                res=record.res,
                reqhdrs=self._formatHeaders(record.req.headers),
                reshdrs=self._formatHeaders(record.res.headers),
            )

        return result

formatter = HttpFormatter('{asctime} {levelname} {name} {message}', style='{')
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(formatter)
root.addHandler(handler)
root.setLevel(logging.DEBUG)


session = requests.Session()
session.hooks['response'].append(logRoundtrip)
session.get('http://httpbin.org')

I'm using python 3.4, requests 2.19.1:我正在使用 python 3.4,请求 2.19.1:

'urllib3' is the logger to get now (no longer 'requests.packages.urllib3'). 'urllib3' 是现在获取的记录器(不再是 'requests.packages.urllib3')。 Basic logging will still happen without setting http.client.HTTPConnection.debuglevel如果没有设置 http.client.HTTPConnection.debuglevel,基本日志记录仍然会发生

I'm using a logger_config.yaml file to configure my logging, and to get those logs to show up, all I had to do was to add a disable_existing_loggers: False to the end of it.我正在使用logger_config.yaml文件来配置我的日志记录,并让这些日志显示出来,我所要做的就是在它的末尾添加一个disable_existing_loggers: False

My logging setup is rather extensive and confusing, so I don't even know a good way to explain it here, but if someone's also using a YAML file to configure their logging, this might help.我的日志记录设置相当广泛且令人困惑,所以我什至不知道在这里解释它的好方法,但如果有人也使用 YAML 文件来配置他们的日志记录,这可能会有所帮助。

https://docs.python.org/3/howto/logging.html#configuring-logging https://docs.python.org/3/howto/logging.html#configuring-logging

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM