使用請求 Python 登錄網站

Question

解決方案：此特定站點的action是action="user/ajax/login"所以這是必須附加到主站點的 url 以實現有效負載的內容。 （可以通過在Page Source action搜索ctrl + f來找到action ）。 url將被刮掉。 with requests.Session() as s:是在站點內維護 cookies 的原因，這允許一致的抓取。 res變量是將有效負載發布到登錄 url 的響應，允許用戶從特定帳戶頁面抓取。 發布后，請求將獲得指定的url 。 有了這個，BeautifulSoup 現在可以從帳戶站點中獲取和解析 HTML。 在這種情況下， "html.parser"和"lxml"都是兼容的。 If there is HTML from within an iframe , it's doubtful it can be grabbed and parsed using only requests , so I recommend using selenium using Firefox.

import requests

payload = {"username":"?????", "password":"?????"}
url = "https://9anime.to/user/watchlist"
loginurl = "https://9anime.to/user/ajax/login"

with requests.Session() as s:
    res = s.post(loginurl, data=payload)
    res = s.get(url)

from bs4 import BeautifulSoup

soup = BeautifulSoup(res.text, "html.parser")

[Windows 10] To install Selenium pip3 install selenium and for the drivers - (chrome: https://sites.google.com/a/chromium.org/chromedriver/downloads ) (Firefox: https://github.com/mozilla /geckodriver/releases ) 如何將“geckodriver”放入 Firefox Selenium 的路徑中： control panel "environmental variables "Path" "New" "file location for "geckodriver" enter然后你的所有設置。 此外，為了在使用 selenium 時獲取iframes ，請在使用驅動程序“獲取”url 后嘗試import time和 time.sleep time.sleep(5) 。 這將使網站有更多時間加載那些額外的iframes示例：

import time
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Firefox()  # The WebDriver for this script
driver.get("https://www.google.com/")
time.sleep(5)  # Extra time for the iframe(s) to load
soup = BeautifulSoup(driver.page_source, "lxml")

print(soup.prettify())  # To see full HTML content
print(soup.find_all("iframe"))  # Finds all iframes

print(soup.find("iframe"))["src"]  # If you need the 'src' from within an iframe.

Answer 1

您正在嘗試向需要登錄的 URL 發出GET請求，因此它會產生403錯誤，這意味着被禁止。 這意味着請求未經過身份驗證以查看內容。

如果您根據您在GET請求中構建的 URL 來考慮它，您將在 url 中公開用戶名(x)和密碼(y) ，如下所示：

https://9anime.to/user/watchlist?username=x&password=y

...這當然會帶來安全風險。

在不知道您對該特定站點具有什么特定訪問權限的情況下，原則上您需要先使用POST請求模擬身份驗證，然后在該頁面上執行GET請求。 成功的響應將返回200狀態代碼('OK') ，然后您將在 position 中使用 BeautifulSoup 解析內容並從相關 Z4C4AD5FCA2E301F74DBB1AA04 標記之間定位您想要的內容部分。

Answer 2

我建議，首先，提供登錄頁面的地址並連接。 然后你做一個

input('Enter something')

允許您暫停連接時間（您必須在終端中按 ENTER 鍵才能在連接后繼續該過程，瞧。）

Answer 3

已解決：在這種情況下， action-tag是user/ajax/login 。 So by appending that to the original main url of the website - not https://9anime.to/user/watchlist but to https://9anime.to you get https://9anime.to/user/ajax/login and這將為您提供登錄 url。

import requests
from bs4 import BeautifulSoup as bs
url = "https://9anime.to/user/watchlist"
loginurl = "https://9anime.to/user/ajax/login"
payload = {"username":"?????", "password":"?????"}
with requests.Session() as s:
    res = s.post(loginurl, data=payload)
    res = s.get(url)

使用請求 Python 登錄網站

問題描述

3 個解決方案

解決方案1
1 2020-05-24 20:58:27

解決方案2
0 2020-05-24 20:40:01

解決方案3
0 已采納 2020-05-25 03:22:51

使用請求 Python 登錄網站

問題描述

3 個解決方案

解決方案1 1 2020-05-24 20:58:27

解決方案2 0 2020-05-24 20:40:01

解決方案3 0 已采納 2020-05-25 03:22:51

解決方案1
1 2020-05-24 20:58:27

解決方案2
0 2020-05-24 20:40:01

解決方案3
0 已采納 2020-05-25 03:22:51