简体   繁体   中英

Scraping a website that URL doesn't change when clicking on "next page" button

I'm trying to scrape a BBC website

https://www.bbc.com/news/topics/c95yz8vxvy8t/hong-kong-anti-government-protests

and I would like to get all the news articles. But the URL doesn't change when clicking on the next page button so I can only get the first page information. Can anyone help? I'm using Selenium but familiar with requests too. Thanks!

use developer console in your browser, go to networks tab, disable cache. you can see api requests being made for each page change. you dont need selenium, you can just use requests or aiohttp.

this is an example: https://push.api.bbci.co.uk/batch?t=%2Fdata%2Fbbc-morph-lx-commentary-data-paged%2Fabout%2Fd5803bfc-472d-4abf-b334-d3fc4aa8ebf9%2FisUk%2Ffalse%2Flimit%2F20%2FnitroKey%2Flx-nitro%2FpageNumber%2F2%2Fversion%2F1.5.6?timeout=5

type "batch" in the filter bar and you should see only the api calls I believe to be responsible for page change.

you can get the about id(d5803bfc-472d-4abf-b334-d3fc4aa8ebf9) of this topic in the webpage source. in this case in, https://www.bbc.com/news/topics/c95yz8vxvy8t/hong-kong-anti-government-protests

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM