简体   繁体   中英

Python Requests splits TCP packet

I am trying to script a HTTP POST request with python.

When trying it with curl from bash, everything is working. With python, using either the requests or the urllib3 -library, I am getting an error response from the API. The POST request contains information in headers and as json in the request body.

What I noticed, when I intercept the packets with Wireshark, the curl-request (which is working) is one single packet of length 374 bytes. The python-request (no difference between requests and urllib3 here) is splitted into 2 separate packets of 253 and 144 bytes length.

在此处输入图片说明

Wireshark reassembles these without problems and they both seem to contain the complete information in header and POST body. But the API I am trying to connect to answeres with a not very helpful "Error when processing request".

As the 253 bytes can't be the limit of a TCP-packet, what is the reason for that behavior? Is there a way to fix that?

EDIT:

bash:

curl 'http://localhost/test.php' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36' -H 'Content-Type: application/json' -d '{"key1":"value1","key2":"value2","key3":"value3"}'

python:

import requests, json

headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36",
"Content-Type": "application/json"}

data = {"key1":"value1", "key2":"value2", "key3":"value3"}

r=requests.post("http://localhost/test.php", headers=headers, data=json.dumps(data))

TCP is a data stream and not a series of messages. The segmentation of the data stream into packets should be of no relevance to the interpretation of the data stream, neither in sender nor recipient. If the recipients actually behaves differently based on the segmentation of the packets the the recipient is broken.

While I've seen such broken systems I've seen more systems which do not like the request for different reasons, like wrong user agent, missing accept header or similar. I would suggest you check this first before concluding that it must be the segmentation of the data stream.

As for why curl and requests behave differently: probably curl first constructs the full request (header and body) and sends it while requests constructs first the header and sends it and then sends the body, ie does two write operations which might result in two packets.

Although it should not matter for the issue you are having, there is a way to force the data being sent into one packet for multiple sends, namely using the TCP_CORK option on the socket (platform dependent though).

Create an adapter first:

from requests.packages.urllib3.connection import HTTPConnection

class HTTPAdapterWithSocketOptions(requests.adapters.HTTPAdapter):
    def __init__(self, *args, **kwargs):
        self.socket_options = kwargs.pop("socket_options", None)
        super(HTTPAdapterWithSocketOptions, self).__init__(*args, **kwargs)

    def init_poolmanager(self, *args, **kwargs):
        if self.socket_options is not None:
            kwargs["socket_options"] = self.socket_options
        super(HTTPAdapterWithSocketOptions, self).init_poolmanager(*args, **kwargs)

Then use it for the requests you want to send out:

s = requests.Session()
options = HTTPConnection.default_socket_options + [ (socket.IPPROTO_TCP, socket.TCP_CORK, 1)]
adapter = HTTPAdapterWithSocketOptions(socket_options=options)
s.mount("http://", adapter)

Sadly there are indeed very broken systems as @Steffen Ullrich explains (even though they claim to be industry standards) which aren't capable of handling fragmented TCP frames. Since my application/script is rather isolated and self-contained, I used the simpler workaround based on @Roeften 's answer which applies TCP_CORK to all connections.

Warning : this workaround makes sense only in situations when you don't risk breaking any other functionality relying on requests .

requests.packages.urllib3.connection.HTTPConnection.default_socket_options = [(6,3,1)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM