[英]Selenium python return history 302
通過嘗試訪問 api 嘗試從網站上抓取內容時遇到問題
我使用此代碼登錄頁面
options = webdriver.ChromeOptions()
#options.add_argument("--headless") # Runs Chrome in headless mode.
options.add_argument('--no-sandbox') # Bypass OS security model
options.add_argument('--disable-gpu') # applicable to windows os only
options.add_argument('disable-infobars')
options.add_argument("--disable-extensions")
options.add_argument("AppData\\Local\\Google\\Chrome\\User Data\\Defaul") #Path to your chrome profile
driver = webdriver.Chrome(executable_path="\\chromedriver.exe", chrome_options=options)
#driver=webdriver.Chrome()
site="https:xxxxxx"
# call open browser function
# Github credentials
username = "username"
password = "password"
# head to github login page
driver.get(site)
# find username/email field and send the username itself to the input field
driver.find_element_by_id("username").send_keys(username)
time.sleep(20)
# find password input field and insert password as well
driver.find_element_by_id("password").send_keys(password)
time.sleep(25)
# click login button
driver.find_element_by_id("btnLogin").click()
在登錄頁面中,我看到一個登錄表單,每次登錄時值都會發生變化
id="FormLogin" method="post" style="margin-top: 1%;"><input name="__RequestVerificationToken" type="hidden" value="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
登錄頁面后,它返回一個otp
otp="xxxxxx"
driver.find_element_by_id("otp").send_keys(otp)
driver.find_element_by_xpath("/html/body/div/div/div/div/button[1]").click()
登錄頁面后,我可以搜索並查看結果,但是每當我嘗試使用 api 抓取結果時,它都會返回 none
我嘗試保存我的 session
s = requests.Session()
# Set correct user agent
selenium_user_agent = driver.execute_script("return navigator.userAgent;")
s.headers.update({"user-agent": selenium_user_agent})
for cookie in driver.get_cookies():
s.cookies.set(cookie['name'], cookie['value'], domain=cookie['domain'])
然后,使用此代碼抓取內容
import requests
base_url = 'https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxPageNumber=1&CurrentSystemDate=12/08/2022&X-Requested-With=XMLHttpRequest'
response = s.get(base_url)
print(response.status_code) #it returns 200
print(response.history) #it returns 302
print(response.content) # It returns login page content
如果我在瀏覽器中通過 API,它會返回登錄頁面,並強制我注銷
<form action="/Logout" method="get">
<input name="__RequestVerificationToken" type="hidden" value="y_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" />
我認為登錄時頁面保存了令牌或 session。 如果我通過 api 抓取內容,我的令牌或 session 是空的,所以它返回登錄頁面內容請幫我解決問題 感謝閱讀和幫助我
您可以在 get() 中指定標頭,並在標頭中設置令牌值。 最好在瀏覽器中打開 base_url 並跟蹤請求以查看其有效負載格式(包括標頭),以便您可以模仿它。
If the base_url is not an API url, an alternative is to locate a high level html element of the page and get its html content directly.
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.