[英]Python requests.get() VS Powershell Invoke-WebRequest -Uri
[英]Powershell Invoke-WebRequest works but Python Requests does not
這是關於 Powershell Invoke-WebRequest 按預期工作而 Python Requests 不工作的奇怪情況。
我正在嘗試使用 python 抓取電子商務網站。部分抓取是為了測試是否可以將商品添加到購物車。 使用 Chrome 開發者工具 F12,我能夠提取以下 Powershell 腳本。
第 1 步 - 請求客戶 session
$session = New-Object Microsoft.PowerShell.Commands.WebRequestSession
$session.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
$secPasswd=ConvertTo-SecureString "password" -AsPlainText -Force
$myCreds=New-Object System.Management.Automation.PSCredential -ArgumentList "username",$secPasswd
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/customer-session?locale=de_de" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-WebSession $session `
-Headers @{
"sec-ch-ua"="`" Not A;Brand`";v=`"99`", `"Chromium`";v=`"99`", `"Google Chrome`";v=`"99`""
"Accept"="application/json, text/plain, */*"
"Cache-Control"="no-cache"
"DNT"="1"
"sec-ch-ua-mobile"="?0"
"sec-ch-ua-platform"="`"Windows`""
"Origin"="https://www.hermes.com"
"Sec-Fetch-Site"="same-site"
"Sec-Fetch-Mode"="cors"
"Sec-Fetch-Dest"="empty"
"Referer"="https://www.hermes.com/"
"Accept-Encoding"="gzip, deflate, br"
"Accept-Language"="en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5"
} | Select-Object -Expand RawContent
響應將給我一個“ECOM_SESS”cookie 以及其他一些 cookie。
然后我會將 ECOM_SESS cookie 傳遞到下一步。
第 2 步 - 添加到購物車
$session = New-Object Microsoft.PowerShell.Commands.WebRequestSession
$session.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
$session.Cookies.Add((New-Object System.Net.Cookie("ECOM_SESS", "XXXXXXXXXXXXXXXX", "/", ".hermes.com")))
$secPasswd=ConvertTo-SecureString "password" -AsPlainText -Force
$myCreds=New-Object System.Management.Automation.PSCredential -ArgumentList "username",$secPasswd
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/add-to-cart" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-Method "POST" `
-WebSession $session `
-Headers @{
"sec-ch-ua"="`" Not A;Brand`";v=`"99`", `"Chromium`";v=`"99`", `"Google Chrome`";v=`"99`""
"Accept"="application/json, text/plain, */*"
"DNT"="1"
"sec-ch-ua-mobile"="?0"
"sec-ch-ua-platform"="`"Windows`""
"Origin"="https://www.hermes.com"
"Sec-Fetch-Site"="same-site"
"Sec-Fetch-Mode"="cors"
"Sec-Fetch-Dest"="empty"
"Referer"="https://www.hermes.com/"
"Accept-Encoding"="gzip, deflate, br"
"Accept-Language"="en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5"
} `
-ContentType "application/json" `
-Body "{`"locale`":`"de_de`",`"items`":[{`"category`":`"direct`",`"sku`":`"H079082CCAC`"}]}"
使用上面的 Powershell 腳本,該過程完美運行,我會從兩個步驟中的每一個步驟中得到響應。 請注意,這是一個旋轉的 IP 代理,它會在每次請求時刷新 IP 以防止機器人檢測。
但是,當我嘗試將其集成到我的 Python 代碼中時,無論使用何種代理服務器,我都會在第 2 步遇到驗證碼要求。
這是相關的 python 代碼:
from __future__ import print_function
import bs4
import requests
from requests.cookies import RequestsCookieJar
import jsons
def main():
url1= "https://bck.hermes.com/customer-session?locale=de_de"
url2 = "https://bck.hermes.com/add-to-cart"
proxies1 = {
"http": "xxxxxxxxxxxxxxxxxx"
}
headers1 = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
'Accept': 'application/json, text/plain, */*',
'Cache-Control': 'no-cache',
'DNT': '1',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'Origin': 'https://www.hermes.com',
'Sec-Fetch-Site': 'same-site',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Dest': 'document',
'Referer': 'https://www.hermes.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5'
}
headers2 = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
'Accept': 'application/json, text/plain, */*',
'DNT': '1',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'Origin': 'https://www.hermes.com',
'Sec-Fetch-Site': 'same-site',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Dest': 'empty',
'Referer': 'https://www.hermes.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5'
}
body2 = {"locale":"de_de","items":[{"category":"direct","sku":"H079082CCAC"}]}
#Step 1
f = requests.get(url1, headers=headers1,proxies=proxies1)
print(f"1Response Body: {f.text}\n")
ECOM_SESS = f.cookies['ECOM_SESS']
cookieJar = RequestsCookieJar()
cookieJar.set('ECOM_SESS', ECOM_SESS, domain='.hermes.com', path='/')
#Step 2
g = requests.post(url2, headers=headers2,cookies=cookieJar,proxies=proxies1,json=body2)
print(f"2Response Body: {g.text}\n")
if __name__ == '__main__':
main()
在這里運行 Python 代碼,第 1 步會很好地給出預期的響應,其中 cookies 需要傳遞到第 2 步。但是,第 2 步總是會產生驗證碼響應。
我只是好奇 Powershell Invoke-WebRequest 方法和 Python Requests 方法之間的區別,因為前者必須有一些根本不同的東西才能完全避免驗證碼,而后者總是被驗證碼擊中。
感謝你們的任何想法和見解! 謝謝!
我不確定具體是什么請求觸發了網站上的機器人保護,但基於此,您可能會幸運地使用:
requests.request("POST", url2, headers=headers2, cookies=cookieJar, proxies=proxies1, json=body2)
或者,您可以嘗試使用 urllib3而不是 Requests。
這是您的 powershell 腳本,也作為練習進行了簡化。
$secPasswd=ConvertTo-SecureString "password" -AsPlainText -Force
$myCreds=New-Object System.Management.Automation.PSCredential -ArgumentList "username",$secPasswd
$headers = @{
"sec-ch-ua"='" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"'
"DNT"="1"
"sec-ch-ua-mobile"="?0"
"sec-ch-ua-platform"="`"Windows`""
"Origin"="https://www.hermes.com"
"Sec-Fetch-Site"="same-site"
"Sec-Fetch-Mode"="cors"
"Sec-Fetch-Dest"="empty"
"Referer"="https://www.hermes.com/"
}
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/customer-session?locale=de_de" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-SessionVariable session `
-Headers $headers
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/add-to-cart" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-Method POST `
-WebSession $session `
-Headers $headers `
-ContentType "application/json" `
-Body "{`"locale`":`"de_de`",`"items`":[{`"category`":`"direct`",`"sku`":`"H079082CCAC`"}]}"
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.