Python“请求”和Linux“卷曲”之间的区别

Question

I tried through several means, but nowhere do I find a satisfatory answer to this -我尝试了几种方法，但我找不到令人满意的答案 -

What are the differences between Python "requests" module and Linux "curl" command? Python“请求”模块和 Linux“curl”命令之间有什么区别？ Does "requests" use "curl" underlying, or is it totally different way of dealing with HTTP request/response? “请求”是否使用底层的“curl”，或者它是处理 HTTP 请求/响应的完全不同的方式？

For most of the requests, they both behave in the same way (as it should be), but sometimes, I find a difference in response and it is really hard to figure out why is it so.对于大多数请求，它们的行为方式相同（应该如此），但有时，我发现响应有所不同，真的很难弄清楚为什么会这样。

eg.例如。 Using curl for HEAD request:使用curl进行HEAD请求：

curl --head https://historia.sherpadesk.com
HTTP/2 302 
content-type: text/html; charset=utf-8
date: Mon, 28 Feb 2022 20:31:30 GMT
access-control-expose-headers: Request-Context
cache-control: private
location: /login/?ref=portal
set-cookie: ASP.NET_SessionId=nghpw4qp5cw2ntwmwfuxw3oi; path=/; HttpOnly; SameSite=Lax
content-length: 135
request-context: appId=cid-v1:d5f9900e-ecd4-442f-9e92-e11b4cdbc0c9
x-frame-options: SAMEORIGIN
x-xss-protection: 1
x-content-type-options: nosniff
strict-transport-security: max-age=31536000

and if I use -L to follow redirects,如果我使用-L来跟随重定向，

curl --head https://historia.sherpadesk.com -L
HTTP/2 302 
content-type: text/html; charset=utf-8
date: Mon, 28 Feb 2022 20:31:37 GMT
access-control-expose-headers: Request-Context
cache-control: private
location: /login/?ref=portal
set-cookie: ASP.NET_SessionId=trzp0bql4nibswux5z5wfayy; path=/; HttpOnly; SameSite=Lax
content-length: 135
request-context: appId=cid-v1:d5f9900e-ecd4-442f-9e92-e11b4cdbc0c9
x-frame-options: SAMEORIGIN
x-xss-protection: 1
x-content-type-options: nosniff
strict-transport-security: max-age=31536000

HTTP/2 302 
content-type: text/html; charset=utf-8
date: Mon, 28 Feb 2022 20:31:38 GMT
access-control-expose-headers: Request-Context
location: https://app.sherpadesk.com/login/?ref=portal
content-length: 161
request-context: appId=cid-v1:d5f9900e-ecd4-442f-9e92-e11b4cdbc0c9
x-frame-options: SAMEORIGIN
x-xss-protection: 1
x-content-type-options: nosniff
strict-transport-security: max-age=31536000

HTTP/2 200 
content-type: text/html; charset=utf-8
date: Mon, 28 Feb 2022 20:31:39 GMT
access-control-expose-headers: Request-Context
cache-control: no-store, no-cache
expires: -1
pragma: no-cache
set-cookie: ASP.NET_SessionId=aqmnxu2s3qkri3sravsrs1cq; path=/; HttpOnly; SameSite=Lax
content-length: 8935
request-context: appId=cid-v1:d5f9900e-ecd4-442f-9e92-e11b4cdbc0c9
x-frame-options: SAMEORIGIN
x-xss-protection: 1
x-content-type-options: nosniff
strict-transport-security: max-age=31536000

and here is the (debug) output when I use Python's requests module requests.head(url) :这是我使用 Python 的请求模块requests.head(url)时的（调试）output：

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): historia.sherpadesk.com:443
send: b'HEAD / HTTP/1.1\r\nHost: historia.sherpadesk.com\r\nUser-Agent: python-requests/2.26.0\r\nAccept-Encoding: gzip, deflate, br\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 403 Forbidden: Access is denied.\r\n'
header: Content-Length: 58
header: Content-Type: text/html
header: Date: Mon, 28 Feb 2022 20:36:18 GMT
header: X-Frame-Options: SAMEORIGIN
header: X-XSS-Protection: 1
header: X-Content-Type-Options: nosniff
header: Strict-Transport-Security: max-age=31536000
DEBUG:urllib3.connectionpool:https://historia.sherpadesk.com:443 "HEAD / HTTP/1.1" 403 0
INFO:root:URL: https://historia.sherpadesk.com/
INFO:root:<Response [403]>

which just results in 403 response code.这只会导致403响应代码。 Response is same whether allow_redirects is True/False .无论allow_redirects是True/False ，响应都是一样的。 I have also tried using proxy with python code, as I thought maybe its getting blocked as this URL might be recognising Python's request to be a bot, but that also fails.我也尝试过将代理与 python 代码一起使用，因为我认为它可能会被阻止，因为这个 URL 可能正在识别 Python 请求成为机器人，但这也失败了。 Also, if that was the case, why does curl succeed?另外，如果是这样，为什么 curl 会成功？

So, my main question here is: what are the major differences between curl and requests, which might cause difference in responses in certain cases?所以，我的主要问题是： what are the major differences between curl and requests, which might cause difference in responses in certain cases? If possible, I would really like thorough explanation which could help me debug and resolve these issues.如果可能的话，我真的很想得到详尽的解释，这可以帮助我调试和解决这些问题。

Answer 1

The two libraries are different but the problem here is related to user agent.这两个库是不同的，但这里的问题与用户代理有关。

When I try with curl, specifying the python-requests user agent:当我尝试使用 curl 时，指定python-requests用户代理：

$ curl  --head -A "python-requests/2.26.0" https://historia.sherpadesk.com/ 
HTTP/2 403 
content-type: text/html
date: Mon, 28 Feb 2022 22:30:02 GMT
content-length: 58
x-frame-options: SAMEORIGIN
x-xss-protection: 1
x-content-type-options: nosniff
strict-transport-security: max-age=31536000

With curl default user agent:使用 curl 默认用户代理：

$ curl --head  https://historia.sherpadesk.com/ 
HTTP/2 302
...

Apparently, they have some type of website security that is blocking HTTP clients like python-requests, but not curl for some reason.显然，他们有某种类型的网站安全性可以阻止 HTTP 客户端，例如 python-requests，但出于某种原因不能阻止 curl。

Python“请求”和Linux“卷曲”之间的区别

问题描述

1 个解决方案

解决方案1
3 2022-02-28 22:31:38

Python“请求”和Linux“卷曲”之间的区别

问题描述

1 个解决方案

解决方案1 3 2022-02-28 22:31:38

解决方案1
3 2022-02-28 22:31:38