繁体   English   中英

如果下一页使用 java 脚本加载,我如何使用 python 抓取下一页数据,没有 URL 更改?

[英]How can i scrape next page data with python if next page load with java script, no URL change?

我正在尝试使用python 抓取网页。 我已经成功抓取了第一页,但是我无法将 go 转到下一页,因为下一页 URL 是相同的,并且下一页正在加载 javascript。

import requests
import bs4 as bs

url ='https://scamalert.sg/scam-details'
r = requests.get(url)
htmlcontent = r.content
soup = bs.BeautifulSoup(htmlcontent, 'html.parser')

for tag in soup.find_all('h4',{"class":"card-title"}):
    print (tag.text)

[网站 HTML][1] [1]: https://i.stack.imgur.com/8zV9y.png

<a class-"page-1ink" href- "javascriptivoid (0) " onclick-"pagingOnCli ck('2') ">2

  • == 0
  • 这是获取所有故事及其相关链接的方法之一,该链接指向遍历该站点所有下一页的详细信息页面。 If you consider checking the chrome dev tools, you will notice that post http requests are made to this url https://scamalert.sg/scam-details/GetStoryListAjax/ along with appropriate parameters to populate json content from which you can extract the desired字段。

    import json
    import requests
    
    base = 'https://scamalert.sg{}'
    link = 'https://scamalert.sg/scam-details/GetStoryListAjax/'
    
    payload = {
        'scamType': '',
        'year': '',
        'month': '',
        'sortBy': 'Latest'
    }
    
    page = 1
    while True:
        payload['page'] = page
        r = requests.post(link,data=payload)
        items = json.loads(r.json()['result'])['StoryList']
        if len(items)<=1:break
        for item in items:
            print(item['Title'],base.format(item['Url']))
    
        page+=1
    

    暂无
    暂无

    声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

     
    粤ICP备18138465号  © 2020-2024 STACKOOM.COM