簡體   English   中英

如果下一頁使用 java 腳本加載,我如何使用 python 抓取下一頁數據,沒有 URL 更改?

[英]How can i scrape next page data with python if next page load with java script, no URL change?

我正在嘗試使用python 抓取網頁。 我已經成功抓取了第一頁,但是我無法將 go 轉到下一頁,因為下一頁 URL 是相同的,並且下一頁正在加載 javascript。

import requests
import bs4 as bs

url ='https://scamalert.sg/scam-details'
r = requests.get(url)
htmlcontent = r.content
soup = bs.BeautifulSoup(htmlcontent, 'html.parser')

for tag in soup.find_all('h4',{"class":"card-title"}):
    print (tag.text)

[網站 HTML][1] [1]: https://i.stack.imgur.com/8zV9y.png

<a class-"page-1ink" href- "javascriptivoid (0) " onclick-"pagingOnCli ck('2') ">2

  • == 0
  • 這是獲取所有故事及其相關鏈接的方法之一,該鏈接指向遍歷該站點所有下一頁的詳細信息頁面。 If you consider checking the chrome dev tools, you will notice that post http requests are made to this url https://scamalert.sg/scam-details/GetStoryListAjax/ along with appropriate parameters to populate json content from which you can extract the desired字段。

    import json
    import requests
    
    base = 'https://scamalert.sg{}'
    link = 'https://scamalert.sg/scam-details/GetStoryListAjax/'
    
    payload = {
        'scamType': '',
        'year': '',
        'month': '',
        'sortBy': 'Latest'
    }
    
    page = 1
    while True:
        payload['page'] = page
        r = requests.post(link,data=payload)
        items = json.loads(r.json()['result'])['StoryList']
        if len(items)<=1:break
        for item in items:
            print(item['Title'],base.format(item['Url']))
    
        page+=1
    

    暫無
    暫無

    聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

     
    粵ICP備18138465號  © 2020-2024 STACKOOM.COM