簡體   English   中英

Selenium python 返回歷史 302

[英]Selenium python return history 302

通過嘗試訪問 api 嘗試從網站上抓取內容時遇到問題

我使用此代碼登錄頁面

options = webdriver.ChromeOptions() 
#options.add_argument("--headless") # Runs Chrome in headless mode.
options.add_argument('--no-sandbox') # Bypass OS security model
options.add_argument('--disable-gpu')  # applicable to windows os only
options.add_argument('disable-infobars')
options.add_argument("--disable-extensions")
options.add_argument("AppData\\Local\\Google\\Chrome\\User Data\\Defaul") #Path to your chrome profile
driver = webdriver.Chrome(executable_path="\\chromedriver.exe", chrome_options=options)
#driver=webdriver.Chrome()
site="https:xxxxxx"
# call open browser function
# Github credentials
username = "username"

password = "password"

# head to github login page
driver.get(site)
# find username/email field and send the username itself to the input field
driver.find_element_by_id("username").send_keys(username)
time.sleep(20)
# find password input field and insert password as well
driver.find_element_by_id("password").send_keys(password)
time.sleep(25)
# click login button
driver.find_element_by_id("btnLogin").click()

在登錄頁面中,我看到一個登錄表單,每次登錄時值都會發生變化

 id="FormLogin" method="post" style="margin-top: 1%;"><input name="__RequestVerificationToken" type="hidden" value="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

登錄頁面后,它返回一個otp

otp="xxxxxx"
driver.find_element_by_id("otp").send_keys(otp)
driver.find_element_by_xpath("/html/body/div/div/div/div/button[1]").click()

登錄頁面后,我可以搜索並查看結果,但是每當我嘗試使用 api 抓取結果時,它都會返回 none

我嘗試保存我的 session

s = requests.Session()
# Set correct user agent
selenium_user_agent = driver.execute_script("return navigator.userAgent;")
s.headers.update({"user-agent": selenium_user_agent})

for cookie in driver.get_cookies():
    s.cookies.set(cookie['name'], cookie['value'], domain=cookie['domain'])

然后,使用此代碼抓取內容

import requests
base_url = 'https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxPageNumber=1&CurrentSystemDate=12/08/2022&X-Requested-With=XMLHttpRequest'

response = s.get(base_url)
print(response.status_code) #it returns 200
print(response.history)  #it returns 302
print(response.content) # It returns login page content

如果我在瀏覽器中通過 API,它會返回登錄頁面,並強制我注銷

<form action="/Logout" method="get">
 <input name="__RequestVerificationToken" type="hidden" value="y_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" />

我認為登錄時頁面保存了令牌或 session。 如果我通過 api 抓取內容,我的令牌或 session 是空的,所以它返回登錄頁面內容請幫我解決問題 感謝閱讀和幫助我

您可以在 get() 中指定標頭,並在標頭中設置令牌值。 最好在瀏覽器中打開 base_url 並跟蹤請求以查看其有效負載格式(包括標頭),以便您可以模仿它。

If the base_url is not an API url, an alternative is to locate a high level html element of the page and get its html content directly.

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM