Python scraping pages after hitting the “load more news” button

Question

I can use the following codes to scrape the first page of a finance news website.

df = pd.DataFrame()
url = 'https://std.stheadline.com/realtime/finance/%E5%8D%B3%E6%99%82-%E8%B2%A1%E7%B6%93'
result = requests.get(url)
result.raise_for_status()
result.encoding = "utf-8"

For downloading the subsequent pages, I need to click the " load more news" buttom. I checked the web site using Chrome>Inspect>Network. I find that after hitting the "load more news" button, the Request URL is "https://std.stheadline.com/realtime/get_more_news" and form data; is "cat=finance&page=3". I put this two together and added "?" in between. But, such URL is not working. Is something missing?

url="https://std.stheadline.com/realtime/get_more_news?cat=finance&page=3"

Answer 1

That button is actually a POST request, so no need to look for anything than an API and then just make the right request.

Here's how:

import requests

headers = {
    "Referer": "https://std.stheadline.com/realtime/finance/",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:83.0) Gecko/20100101 Firefox/83.0",
    "X-Requested-With": "XMLHttpRequest",
}
payload = {
    "cat": "finance",
    "page": 4,
}
print(requests.post("https://std.stheadline.com/realtime/get_more_news/", data=payload, headers=headers).json())

This will "load" you the next page with the news.

Python scraping pages after hitting the “load more news” button

Question

1 answers

solution1
1 ACCPTED 2020-11-30 14:23:53

Python scraping pages after hitting the “load more news” button

Question

1 answers

solution1 1 ACCPTED 2020-11-30 14:23:53

solution1
1 ACCPTED 2020-11-30 14:23:53