简体   繁体   English

单击“下一页”按钮时,抓取 URL 不会更改的网站

[英]Scraping a website that URL doesn't change when clicking on "next page" button

I'm trying to scrape a BBC website我正在尝试抓取 BBC 网站

https://www.bbc.com/news/topics/c95yz8vxvy8t/hong-kong-anti-government-protests https://www.bbc.com/news/topics/c95yz8vxvy8t/hong-kong-anti-government-protests

and I would like to get all the news articles.我想得到所有的新闻文章。 But the URL doesn't change when clicking on the next page button so I can only get the first page information.但是当点击下一页按钮时 URL 不会改变,所以我只能获取第一页信息。 Can anyone help?任何人都可以帮忙吗? I'm using Selenium but familiar with requests too.我正在使用 Selenium,但也熟悉请求。 Thanks!谢谢!

use developer console in your browser, go to networks tab, disable cache.在浏览器中使用开发者控制台,转到网络选项卡,禁用缓存。 you can see api requests being made for each page change.您可以看到针对每个页面更改发出的 api 请求。 you dont need selenium, you can just use requests or aiohttp.你不需要硒,你可以只使用请求或 aiohttp。

this is an example: https://push.api.bbci.co.uk/batch?t=%2Fdata%2Fbbc-morph-lx-commentary-data-paged%2Fabout%2Fd5803bfc-472d-4abf-b334-d3fc4aa8ebf9%2FisUk%2Ffalse%2Flimit%2F20%2FnitroKey%2Flx-nitro%2FpageNumber%2F2%2Fversion%2F1.5.6?timeout=5这是一个例子: https : //push.api.bbci.co.uk/batch?t=%2Fdata%2Fbbc-morph-lx-commentary-data-paged%2Fabout%2Fd5803bfc-472d-4abf-b334-d3fc4aa8ebf9% 2FisUk%2Ffalse%2Flimit%2F20%2FnitroKey%2Flx-nitro%2FpageNumber%2F2%2Fversion%2F1.5.6?timeout=5

type "batch" in the filter bar and you should see only the api calls I believe to be responsible for page change.在过滤器栏中键入“batch”,您应该只看到我认为负责页面更改的 api 调用。

you can get the about id(d5803bfc-472d-4abf-b334-d3fc4aa8ebf9) of this topic in the webpage source.您可以在网页源中获取该主题的 about id(d5803bfc-472d-4abf-b334-d3fc4aa8ebf9)。 in this case in, https://www.bbc.com/news/topics/c95yz8vxvy8t/hong-kong-anti-government-protests在这种情况下, https://www.bbc.com/news/topics/c95yz8vxvy8t/hong-kong-anti-government-protests

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 抓取具有表格但表格上的下一步按钮不会更改 url 的网站 - Scraping a website which has a table but the next button on the table doesn't change the url 从网站抓取某些字段时无法继续单击下一页按钮 - Can't go on clicking on the next page button while scraping certain fields from a website 当单击特定的 onclick 按钮时,从 URL 不会更改的网站中抓取数据 - Scraping data from a website that URL does not change when clicking on a particular onclick button 移动到下一页时 url 不会改变 - url doesn't change when moving to the next page Scraper不会停止点击下一页按钮 - Scraper doesn't stop clicking on the next page button 当 url 不改变时,抓取多个页面 - Scraping through multiple pages when url doesn't change 从点击“显示更多”后URL不变的站点中收集数据 - Scraping data from a site where URL doesn't change on clicking 'Show More' 从下一页抓取数据不会更改URL - Crawl data from next page doesn't change URL 单击下一页按钮时无法从网站上抓取标题 - Can't scrape titles from a website while clicking on the next page button 为包含 _dopostback 方法的多个页面抓取网站,并且页面的 URL 不会更改 - Scraping a website for multiple pages that contains _dopostback method and the URL doesn't change for the pages
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM