如果下一頁使用 java 腳本加載，我如何使用 python 抓取下一頁數據，沒有 URL 更改？

Question

我正在嘗試使用python 抓取網頁。 我已經成功抓取了第一頁，但是我無法將 go 轉到下一頁，因為下一頁 URL 是相同的，並且下一頁正在加載 javascript。

import requests
import bs4 as bs

url ='https://scamalert.sg/scam-details'
r = requests.get(url)
htmlcontent = r.content
soup = bs.BeautifulSoup(htmlcontent, 'html.parser')

for tag in soup.find_all('h4',{"class":"card-title"}):
    print (tag.text)

[網站 HTML][1] [1]: https://i.stack.imgur.com/8zV9y.png

<a class-"page-1ink" href- "javascriptivoid (0) " onclick-"pagingOnCli ck('2') ">2

== 0

Answer 1

這是獲取所有故事及其相關鏈接的方法之一，該鏈接指向遍歷該站點所有下一頁的詳細信息頁面。 If you consider checking the chrome dev tools, you will notice that post http requests are made to this url https://scamalert.sg/scam-details/GetStoryListAjax/ along with appropriate parameters to populate json content from which you can extract the desired字段。

import json
import requests

base = 'https://scamalert.sg{}'
link = 'https://scamalert.sg/scam-details/GetStoryListAjax/'

payload = {
    'scamType': '',
    'year': '',
    'month': '',
    'sortBy': 'Latest'
}

page = 1
while True:
    payload['page'] = page
    r = requests.post(link,data=payload)
    items = json.loads(r.json()['result'])['StoryList']
    if len(items)<=1:break
    for item in items:
        print(item['Title'],base.format(item['Url']))

    page+=1

如果下一頁使用 java 腳本加載，我如何使用 python 抓取下一頁數據，沒有 URL 更改？

問題描述

1 個解決方案

解決方案1
1 2020-05-10 13:22:28

如果下一頁使用 java 腳本加載，我如何使用 python 抓取下一頁數據，沒有 URL 更改？

問題描述

1 個解決方案

解決方案1 1 2020-05-10 13:22:28

解決方案1
1 2020-05-10 13:22:28