简体   繁体   English

Powershell Invoke-WebRequest 有效,但 Python Requests 无效

[英]Powershell Invoke-WebRequest works but Python Requests does not

This is about a weird situation where the Powershell Invoke-WebRequest works as intended and the Python Requests does not.这是关于 Powershell Invoke-WebRequest 按预期工作而 Python Requests 不工作的奇怪情况。

I am trying to scrape a ecommerce site using python. Part of the scraping is to test if an item can be added to cart.我正在尝试使用 python 抓取电子商务网站。部分抓取是为了测试是否可以将商品添加到购物车。 Using the Chrome Developer tools F12, I was able to extract the following Powershell scripts.使用 Chrome 开发者工具 F12,我能够提取以下 Powershell 脚本。

Step 1 - Request a customer session第 1 步 - 请求客户 session

$session = New-Object Microsoft.PowerShell.Commands.WebRequestSession
$session.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
$secPasswd=ConvertTo-SecureString "password" -AsPlainText -Force
$myCreds=New-Object System.Management.Automation.PSCredential -ArgumentList "username",$secPasswd
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/customer-session?locale=de_de" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-WebSession $session `
-Headers @{
"sec-ch-ua"="`" Not A;Brand`";v=`"99`", `"Chromium`";v=`"99`", `"Google Chrome`";v=`"99`""
  "Accept"="application/json, text/plain, */*"
  "Cache-Control"="no-cache"
  "DNT"="1"
  "sec-ch-ua-mobile"="?0"
  "sec-ch-ua-platform"="`"Windows`""
  "Origin"="https://www.hermes.com"
  "Sec-Fetch-Site"="same-site"
  "Sec-Fetch-Mode"="cors"
  "Sec-Fetch-Dest"="empty"
  "Referer"="https://www.hermes.com/"
  "Accept-Encoding"="gzip, deflate, br"
  "Accept-Language"="en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5"
} | Select-Object -Expand RawContent

The response would give me a "ECOM_SESS" cookie along with a bunch others.响应将给我一个“ECOM_SESS”cookie 以及其他一些 cookie。

I would then pass the ECOM_SESS cookie to the next step.然后我会将 ECOM_SESS cookie 传递到下一步。

Step 2 - add to cart第 2 步 - 添加到购物车

$session = New-Object Microsoft.PowerShell.Commands.WebRequestSession
$session.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
$session.Cookies.Add((New-Object System.Net.Cookie("ECOM_SESS", "XXXXXXXXXXXXXXXX", "/", ".hermes.com")))
$secPasswd=ConvertTo-SecureString "password" -AsPlainText -Force
$myCreds=New-Object System.Management.Automation.PSCredential -ArgumentList "username",$secPasswd
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/add-to-cart" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-Method "POST" `
-WebSession $session `
-Headers @{
"sec-ch-ua"="`" Not A;Brand`";v=`"99`", `"Chromium`";v=`"99`", `"Google Chrome`";v=`"99`""
  "Accept"="application/json, text/plain, */*"
  "DNT"="1"
  "sec-ch-ua-mobile"="?0"
  "sec-ch-ua-platform"="`"Windows`""
  "Origin"="https://www.hermes.com"
  "Sec-Fetch-Site"="same-site"
  "Sec-Fetch-Mode"="cors"
  "Sec-Fetch-Dest"="empty"
  "Referer"="https://www.hermes.com/"
  "Accept-Encoding"="gzip, deflate, br"
  "Accept-Language"="en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5"
} `
-ContentType "application/json" `
-Body "{`"locale`":`"de_de`",`"items`":[{`"category`":`"direct`",`"sku`":`"H079082CCAC`"}]}"

With the Powershell script above, the process works perfectly and I would get responses from each of the two steps.使用上面的 Powershell 脚本,该过程完美运行,我会从两个步骤中的每一个步骤中得到响应。 Note this is with a rotating IP proxy which refreshes the IP on each request to prevent bot detection.请注意,这是一个旋转的 IP 代理,它会在每次请求时刷新 IP 以防止机器人检测。

However, when I tried to integrate this into my Python code, I would encounter the requirement of captcha upon Step 2, irrespective of the proxy server used.但是,当我尝试将其集成到我的 Python 代码中时,无论使用何种代理服务器,我都会在第 2 步遇到验证码要求。

Here is the relevant python code:这是相关的 python 代码:

from __future__ import print_function
import bs4
import requests
from requests.cookies import RequestsCookieJar
import jsons

def main():
    url1= "https://bck.hermes.com/customer-session?locale=de_de"
    url2 = "https://bck.hermes.com/add-to-cart"
    proxies1 = {
        "http": "xxxxxxxxxxxxxxxxxx"
    }
    headers1 = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36',         
            'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
            'Accept': 'application/json, text/plain, */*',
            'Cache-Control': 'no-cache',
            'DNT': '1',
            'sec-ch-ua-mobile': '?0',
            'sec-ch-ua-platform': '"Windows"',
            'Origin': 'https://www.hermes.com',
            'Sec-Fetch-Site': 'same-site',
            'Sec-Fetch-Mode': 'cors',
            'Sec-Fetch-Dest': 'document',
            'Referer': 'https://www.hermes.com/',
            'Accept-Encoding': 'gzip, deflate, br',
            'Accept-Language': 'en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5'
    }
    headers2 = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36',
            'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
            'Accept': 'application/json, text/plain, */*',
            'DNT': '1',
            'sec-ch-ua-mobile': '?0',
            'sec-ch-ua-platform': '"Windows"',
            'Origin': 'https://www.hermes.com',
            'Sec-Fetch-Site': 'same-site',
            'Sec-Fetch-Mode': 'cors',
            'Sec-Fetch-Dest': 'empty',
            'Referer': 'https://www.hermes.com/',
            'Accept-Encoding': 'gzip, deflate, br',
            'Accept-Language': 'en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5'
    }
    
    body2 = {"locale":"de_de","items":[{"category":"direct","sku":"H079082CCAC"}]}


    #Step 1

    f = requests.get(url1, headers=headers1,proxies=proxies1)
    print(f"1Response Body: {f.text}\n")
    ECOM_SESS = f.cookies['ECOM_SESS']
    cookieJar = RequestsCookieJar()
    cookieJar.set('ECOM_SESS', ECOM_SESS, domain='.hermes.com', path='/')

    #Step 2
    g = requests.post(url2, headers=headers2,cookies=cookieJar,proxies=proxies1,json=body2)
    print(f"2Response Body: {g.text}\n")

   

if __name__ == '__main__':
    main()

Running the Python code here, Step 1 would nicely give the intended response with the cookies needed to pass onto Step 2. However, Step 2 would always result in a captcha response.在这里运行 Python 代码,第 1 步会很好地给出预期的响应,其中 cookies 需要传递到第 2 步。但是,第 2 步总是会产生验证码响应。

I am just curious as to the difference between the Powershell Invoke-WebRequest method and the Python Requests method, as there has to be something fundamentally different for the former to avoid captcha completely and the latter to always get hit with captcha.我只是好奇 Powershell Invoke-WebRequest 方法和 Python Requests 方法之间的区别,因为前者必须有一些根本不同的东西才能完全避免验证码,而后者总是被验证码击中。

Would appreciate any thoughts and insights from you guys!感谢你们的任何想法和见解! Thanks!谢谢!

I'm not sure specifically what it is about requests that's triggering the bot protection on the site, but based on this you might have luck using:我不确定具体是什么请求触发了网站上的机器人保护,但基于,您可能会幸运地使用:

requests.request("POST", url2, headers=headers2, cookies=cookieJar, proxies=proxies1, json=body2)

Alternatively you could try urllib3 instead of Requests.或者,您可以尝试使用 urllib3而不是 Requests。

Here's your powershell script simplified too just as an excercise.这是您的 powershell 脚本,也作为练习进行了简化。

$secPasswd=ConvertTo-SecureString "password" -AsPlainText -Force
$myCreds=New-Object System.Management.Automation.PSCredential -ArgumentList "username",$secPasswd
$headers = @{
"sec-ch-ua"='" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"'
  "DNT"="1"
  "sec-ch-ua-mobile"="?0"
  "sec-ch-ua-platform"="`"Windows`""
  "Origin"="https://www.hermes.com"
  "Sec-Fetch-Site"="same-site"
  "Sec-Fetch-Mode"="cors"
  "Sec-Fetch-Dest"="empty"
  "Referer"="https://www.hermes.com/"
}
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/customer-session?locale=de_de" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-SessionVariable session `
-Headers $headers
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/add-to-cart" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-Method POST `
-WebSession $session `
-Headers $headers `
-ContentType "application/json" `
-Body "{`"locale`":`"de_de`",`"items`":[{`"category`":`"direct`",`"sku`":`"H079082CCAC`"}]}"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python request.get()与Powershell Invoke-WebRequest -Uri - Python requests.get() VS Powershell Invoke-WebRequest -Uri 如何使用PowerShell Invoke-WebRequest发布 - How to use PowerShell Invoke-WebRequest Post Invoke-WebRequest:找不到与参数名称“LfO”匹配的参数 - Invoke-WebRequest : A parameter cannot be found that matches parameter name 'LfO' Python 请求适用于 PowerShell,但不适用于 WSL - Python requests works on PowerShell but it doesn't on WSL invoke-command 从 powershell 工作,从 cmd window 调用 powershell 不工作 - invoke-command works from powershell, does not work from a cmd window invoking powershell python flask 请求在本地主机上有效,但在远程服务器上无效 - python flask requests works on localhost but does not work on remote server 使用 bs4 解析文本适用于 selenium 但不适用于 Python 中的请求 - Parsing text with bs4 works with selenium but does not work with requests in Python GET Request适用于python'requests'库,但不适用于curl - GET Request works with python 'requests' library, but does not work with curl Curl 有效,但 Python 请求无效 - Curl works but not Python requests 为什么在py运行时python命令没有在powershell中调用python解释器 - why python command does not invoke python interpreter in powershell though py is working
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM