Python：Web 使用 requests-html 抓取不起作用

Question

我正在嘗試從交易網站上抓取數據。 我從 python 'requests' 庫開始，但它返回的 HTML 頁面與我瀏覽器上的頁面不同。

我觀察到 web 頁面在加載缺少的信息時有輕微延遲，在研究中，我發現這可以使用“requests-html”package 解決。 但是，“requests-html”庫返回與“請求”相同的 HTML。

我知道這可以通過使用 selenium 來解決，但是有沒有辦法使用上述庫來做到這一點？

這是我的代碼

from bs4 import BeautifulSoup
import requests
import time
from requests_html import HTMLSession

with HTMLSession() as s:
    login_url = 'https://www.screener.in/login/'
    USERNAME = "username"
    PASSWORD = "password"

    s.get(login_url)
    csrftoken = s.cookies['csrftoken']

    login_data = dict(csrfmiddlewaretoken=csrftoken, next='', username=USERNAME, password=PASSWORD)
    s.post(login_url, data=login_data, headers={"Referer": "https://www.screener.in/"})

    r = s.get('https://www.screener.in/company/ABBOTINDIA/')
    r.html.render(timeout=10, sleep=10)
    print(r.html.html)

我哪里錯了？ 標題有問題嗎？

我是 web 抓取的新手，非常感謝您的幫助。

Answer 1

csrftoken和csrfmiddlewaretoken不一樣。

csrfmiddlewaretoken需要通過響應數據發送，而csrftoken需要是 cookie。

他們也有（至少對我來說）不同的價值觀。

Python：Web 使用 requests-html 抓取不起作用

問題描述

1 個解決方案

解決方案1
0 2021-01-11 10:35:05

Python：Web 使用 requests-html 抓取不起作用

問題描述

1 個解決方案

解決方案1 0 2021-01-11 10:35:05

解決方案1
0 2021-01-11 10:35:05