简体   繁体   English

Python“请求”和Linux“卷曲”之间的区别

[英]Difference between Python "requests" and Linux "curl"

I tried through several means, but nowhere do I find a satisfatory answer to this -我尝试了几种方法,但我找不到令人满意的答案 -

What are the differences between Python "requests" module and Linux "curl" command? Python“请求”模块和 Linux“curl”命令之间有什么区别? Does "requests" use "curl" underlying, or is it totally different way of dealing with HTTP request/response? “请求”是否使用底层的“curl”,或者它是处理 HTTP 请求/响应的完全不同的方式?

For most of the requests, they both behave in the same way (as it should be), but sometimes, I find a difference in response and it is really hard to figure out why is it so.对于大多数请求,它们的行为方式相同(应该如此),但有时,我发现响应有所不同,真的很难弄清楚为什么会这样。

eg.例如。 Using curl for HEAD request:使用curl进行HEAD请求:

curl --head https://historia.sherpadesk.com
HTTP/2 302 
content-type: text/html; charset=utf-8
date: Mon, 28 Feb 2022 20:31:30 GMT
access-control-expose-headers: Request-Context
cache-control: private
location: /login/?ref=portal
set-cookie: ASP.NET_SessionId=nghpw4qp5cw2ntwmwfuxw3oi; path=/; HttpOnly; SameSite=Lax
content-length: 135
request-context: appId=cid-v1:d5f9900e-ecd4-442f-9e92-e11b4cdbc0c9
x-frame-options: SAMEORIGIN
x-xss-protection: 1
x-content-type-options: nosniff
strict-transport-security: max-age=31536000

and if I use -L to follow redirects,如果我使用-L来跟随重定向,

curl --head https://historia.sherpadesk.com -L
HTTP/2 302 
content-type: text/html; charset=utf-8
date: Mon, 28 Feb 2022 20:31:37 GMT
access-control-expose-headers: Request-Context
cache-control: private
location: /login/?ref=portal
set-cookie: ASP.NET_SessionId=trzp0bql4nibswux5z5wfayy; path=/; HttpOnly; SameSite=Lax
content-length: 135
request-context: appId=cid-v1:d5f9900e-ecd4-442f-9e92-e11b4cdbc0c9
x-frame-options: SAMEORIGIN
x-xss-protection: 1
x-content-type-options: nosniff
strict-transport-security: max-age=31536000

HTTP/2 302 
content-type: text/html; charset=utf-8
date: Mon, 28 Feb 2022 20:31:38 GMT
access-control-expose-headers: Request-Context
location: https://app.sherpadesk.com/login/?ref=portal
content-length: 161
request-context: appId=cid-v1:d5f9900e-ecd4-442f-9e92-e11b4cdbc0c9
x-frame-options: SAMEORIGIN
x-xss-protection: 1
x-content-type-options: nosniff
strict-transport-security: max-age=31536000

HTTP/2 200 
content-type: text/html; charset=utf-8
date: Mon, 28 Feb 2022 20:31:39 GMT
access-control-expose-headers: Request-Context
cache-control: no-store, no-cache
expires: -1
pragma: no-cache
set-cookie: ASP.NET_SessionId=aqmnxu2s3qkri3sravsrs1cq; path=/; HttpOnly; SameSite=Lax
content-length: 8935
request-context: appId=cid-v1:d5f9900e-ecd4-442f-9e92-e11b4cdbc0c9
x-frame-options: SAMEORIGIN
x-xss-protection: 1
x-content-type-options: nosniff
strict-transport-security: max-age=31536000

and here is the (debug) output when I use Python's requests module requests.head(url) :这是我使用 Python 的请求模块requests.head(url)时的(调试)output:

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): historia.sherpadesk.com:443
send: b'HEAD / HTTP/1.1\r\nHost: historia.sherpadesk.com\r\nUser-Agent: python-requests/2.26.0\r\nAccept-Encoding: gzip, deflate, br\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 403 Forbidden: Access is denied.\r\n'
header: Content-Length: 58
header: Content-Type: text/html
header: Date: Mon, 28 Feb 2022 20:36:18 GMT
header: X-Frame-Options: SAMEORIGIN
header: X-XSS-Protection: 1
header: X-Content-Type-Options: nosniff
header: Strict-Transport-Security: max-age=31536000
DEBUG:urllib3.connectionpool:https://historia.sherpadesk.com:443 "HEAD / HTTP/1.1" 403 0
INFO:root:URL: https://historia.sherpadesk.com/
INFO:root:<Response [403]>

which just results in 403 response code.这只会导致403响应代码。 Response is same whether allow_redirects is True/False .无论allow_redirectsTrue/False ,响应都是一样的。 I have also tried using proxy with python code, as I thought maybe its getting blocked as this URL might be recognising Python's request to be a bot, but that also fails.我也尝试过将代理与 python 代码一起使用,因为我认为它可能会被阻止,因为这个 URL 可能正在识别 Python 请求成为机器人,但这也失败了。 Also, if that was the case, why does curl succeed?另外,如果是这样,为什么 curl 会成功?

So, my main question here is: what are the major differences between curl and requests, which might cause difference in responses in certain cases?所以,我的主要问题是: what are the major differences between curl and requests, which might cause difference in responses in certain cases? If possible, I would really like thorough explanation which could help me debug and resolve these issues.如果可能的话,我真的很想得到详尽的解释,这可以帮助我调试和解决这些问题。

The two libraries are different but the problem here is related to user agent.这两个库是不同的,但这里的问题与用户代理有关。

When I try with curl, specifying the python-requests user agent:当我尝试使用 curl 时,指定python-requests用户代理:

$ curl  --head -A "python-requests/2.26.0" https://historia.sherpadesk.com/ 
HTTP/2 403 
content-type: text/html
date: Mon, 28 Feb 2022 22:30:02 GMT
content-length: 58
x-frame-options: SAMEORIGIN
x-xss-protection: 1
x-content-type-options: nosniff
strict-transport-security: max-age=31536000

With curl default user agent:使用 curl 默认用户代理:

$ curl --head  https://historia.sherpadesk.com/ 
HTTP/2 302
...

Apparently, they have some type of website security that is blocking HTTP clients like python-requests, but not curl for some reason.显然,他们有某种类型的网站安全性可以阻止 HTTP 客户端,例如 python-requests,但出于某种原因不能阻止 curl。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM