简体   繁体   中英

How to scrape an infinite load page with R

I've been scraping some news media pages for my small research. And I've encountered this page where the articles load as you scroll. I've tried monitoring the response in the F12 inspector and figured out (with the XHR file) that the pages do indeed load separately after a certain number of articles.

I can even see the URLs to them in the inspector, a screenshot to show what I mean,

截图

However, I have no idea how to load this information into R . I've been using Rvest , but not sure if it's helpful here, perhaps HTTR?

Thank you!

You can page through the raw json at this endpoint:

http://kolumbus-api.lrytas.lt/query/?count=&tag_slugs=politika&type=Video,Articolo&order=pubfromdate-&ret_fields=props.type__AS__type,props.media[indexof(x.type=%27media%27%20for%20x%20in%20props.media)][%27hd-alternate-href%27]__AS__thumb,props.categories[0].name__AS__category,props.href__AS__href,props.title__AS__title,props.commentCount__AS__commentCount,props.media[indexof(x.type=%27media%27%20for%20x%20in%20props.media)].otheralternate.1280x720.href__AS__imgxl,props.media[indexof(x.type=%27media%27%20for%20x%20in%20props.media)].otheralternate.300x200.href__AS__imgm,props.media__AS__media_json&page=1

just page through changing the page parameter at the very end there: page=1 until you reach the end.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM