如果下一页使用 java 脚本加载，我如何使用 python 抓取下一页数据，没有 URL 更改？

Question

我正在尝试使用python 抓取网页。 我已经成功抓取了第一页，但是我无法将 go 转到下一页，因为下一页 URL 是相同的，并且下一页正在加载 javascript。

import requests
import bs4 as bs

url ='https://scamalert.sg/scam-details'
r = requests.get(url)
htmlcontent = r.content
soup = bs.BeautifulSoup(htmlcontent, 'html.parser')

for tag in soup.find_all('h4',{"class":"card-title"}):
    print (tag.text)

[网站 HTML][1] [1]: https://i.stack.imgur.com/8zV9y.png

<a class-"page-1ink" href- "javascriptivoid (0) " onclick-"pagingOnCli ck('2') ">2

== 0

Answer 1

这是获取所有故事及其相关链接的方法之一，该链接指向遍历该站点所有下一页的详细信息页面。 If you consider checking the chrome dev tools, you will notice that post http requests are made to this url https://scamalert.sg/scam-details/GetStoryListAjax/ along with appropriate parameters to populate json content from which you can extract the desired字段。

import json
import requests

base = 'https://scamalert.sg{}'
link = 'https://scamalert.sg/scam-details/GetStoryListAjax/'

payload = {
    'scamType': '',
    'year': '',
    'month': '',
    'sortBy': 'Latest'
}

page = 1
while True:
    payload['page'] = page
    r = requests.post(link,data=payload)
    items = json.loads(r.json()['result'])['StoryList']
    if len(items)<=1:break
    for item in items:
        print(item['Title'],base.format(item['Url']))

    page+=1

如果下一页使用 java 脚本加载，我如何使用 python 抓取下一页数据，没有 URL 更改？

问题描述

1 个解决方案

解决方案1
1 2020-05-10 13:22:28

如果下一页使用 java 脚本加载，我如何使用 python 抓取下一页数据，没有 URL 更改？

问题描述

1 个解决方案

解决方案1 1 2020-05-10 13:22:28

解决方案1
1 2020-05-10 13:22:28