Python 请求拆分 TCP 数据包

Question

I am trying to script a HTTP POST request with python.我正在尝试使用 python 编写 HTTP POST 请求脚本。

When trying it with curl from bash, everything is working.使用 bash 中的 curl 尝试它时，一切正常。 With python, using either the requests or the urllib3 -library, I am getting an error response from the API.使用 python，使用requests或urllib3 -library，我收到来自 API 的错误响应。 The POST request contains information in headers and as json in the request body. POST 请求包含标头中的信息和请求正文中的 json。

What I noticed, when I intercept the packets with Wireshark, the curl-request (which is working) is one single packet of length 374 bytes.我注意到，当我用 Wireshark 拦截数据包时，curl 请求（正在工作）是一个长度为 374 字节的单个数据包。 The python-request (no difference between requests and urllib3 here) is splitted into 2 separate packets of 253 and 144 bytes length. python-request（这里的requests和urllib3没有区别）被分成 2 个单独的数据包，长度分别为 253 和 144 字节。

Wireshark reassembles these without problems and they both seem to contain the complete information in header and POST body. Wireshark 可以毫无问题地重新组装它们，它们似乎都在标题和 POST 正文中包含完整信息。 But the API I am trying to connect to answeres with a not very helpful "Error when processing request".但是我试图通过一个不是很有帮助的“处理请求时出错”的 API 连接到答案。

As the 253 bytes can't be the limit of a TCP-packet, what is the reason for that behavior?由于 253 字节不能成为 TCP 数据包的限制，这种行为的原因是什么？ Is there a way to fix that?有没有办法解决这个问题？

EDIT:编辑：

bash:重击：

curl 'http://localhost/test.php' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36' -H 'Content-Type: application/json' -d '{"key1":"value1","key2":"value2","key3":"value3"}'

python: Python：

import requests, json

headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36",
"Content-Type": "application/json"}

data = {"key1":"value1", "key2":"value2", "key3":"value3"}

r=requests.post("http://localhost/test.php", headers=headers, data=json.dumps(data))

Answer 1

TCP is a data stream and not a series of messages. TCP 是一个数据流，而不是一系列消息。 The segmentation of the data stream into packets should be of no relevance to the interpretation of the data stream, neither in sender nor recipient.将数据流分割成数据包应该与数据流的解释无关，无论是发送方还是接收方。 If the recipients actually behaves differently based on the segmentation of the packets the the recipient is broken.如果接收者实际上基于数据包的分段而表现不同，则接收者被破坏。

While I've seen such broken systems I've seen more systems which do not like the request for different reasons, like wrong user agent, missing accept header or similar.虽然我看到过这样损坏的系统，但我看到更多的系统由于不同的原因不喜欢请求，比如错误的用户代理、缺少接受标头或类似的。 I would suggest you check this first before concluding that it must be the segmentation of the data stream.我建议您先检查一下，然后再得出结论，它必须是数据流的分段。

As for why curl and requests behave differently: probably curl first constructs the full request (header and body) and sends it while requests constructs first the header and sends it and then sends the body, ie does two write operations which might result in two packets.至于为什么 curl 和 requests 的行为不同：可能 curl 首先构造完整的请求（标头和正文）并发送它，而请求首先构造标头并发送它然后发送正文，即执行两次可能导致两个数据包的写操作.

Answer 2

Although it should not matter for the issue you are having, there is a way to force the data being sent into one packet for multiple sends, namely using the TCP_CORK option on the socket (platform dependent though).尽管对于您遇到的问题应该无关紧要，但有一种方法可以强制将数据发送到一个数据包中以进行多次发送，即使用套接字上的 TCP_CORK 选项（尽管取决于平台）。

Create an adapter first:首先创建一个适配器：

from requests.packages.urllib3.connection import HTTPConnection

class HTTPAdapterWithSocketOptions(requests.adapters.HTTPAdapter):
    def __init__(self, *args, **kwargs):
        self.socket_options = kwargs.pop("socket_options", None)
        super(HTTPAdapterWithSocketOptions, self).__init__(*args, **kwargs)

    def init_poolmanager(self, *args, **kwargs):
        if self.socket_options is not None:
            kwargs["socket_options"] = self.socket_options
        super(HTTPAdapterWithSocketOptions, self).init_poolmanager(*args, **kwargs)

Then use it for the requests you want to send out:然后将其用于您要发送的请求：

s = requests.Session()
options = HTTPConnection.default_socket_options + [ (socket.IPPROTO_TCP, socket.TCP_CORK, 1)]
adapter = HTTPAdapterWithSocketOptions(socket_options=options)
s.mount("http://", adapter)

Answer 3

Sadly there are indeed very broken systems as @Steffen Ullrich explains (even though they claim to be industry standards) which aren't capable of handling fragmented TCP frames.可悲的是，正如@Steffen Ullrich 所解释的那样，确实存在非常破碎的系统（即使它们声称是行业标准），它们无法处理碎片化的 TCP 帧。 Since my application/script is rather isolated and self-contained, I used the simpler workaround based on @Roeften 's answer which applies TCP_CORK to all connections.由于我的应用程序/脚本相当独立且独立，因此我使用了基于 @Roeften 的答案的更简单的解决方法，该方法将 TCP_CORK 应用于所有连接。

Warning : this workaround makes sense only in situations when you don't risk breaking any other functionality relying on requests .警告：此解决方法仅在您不冒险破坏依赖requests任何其他功能的情况下才有意义。

requests.packages.urllib3.connection.HTTPConnection.default_socket_options = [(6,3,1)]

Python 请求拆分 TCP 数据包

问题描述

3 个解决方案

解决方案1
2 已采纳 2018-05-07 12:38:54

解决方案2
1 2019-11-09 12:31:27

解决方案3
0 2020-10-15 15:58:34

Python 请求拆分 TCP 数据包

问题描述

3 个解决方案

解决方案1 2 已采纳 2018-05-07 12:38:54

解决方案2 1 2019-11-09 12:31:27

解决方案3 0 2020-10-15 15:58:34

解决方案1
2 已采纳 2018-05-07 12:38:54

解决方案2
1 2019-11-09 12:31:27

解决方案3
0 2020-10-15 15:58:34