How to scrape an infinite load page with R

Question

I've been scraping some news media pages for my small research. And I've encountered this page where the articles load as you scroll. I've tried monitoring the response in the F12 inspector and figured out (with the XHR file) that the pages do indeed load separately after a certain number of articles.

I can even see the URLs to them in the inspector, a screenshot to show what I mean,

However, I have no idea how to load this information into R . I've been using Rvest , but not sure if it's helpful here, perhaps HTTR?

Thank you!

Answer 1

You can page through the raw json at this endpoint:

http://kolumbus-api.lrytas.lt/query/?count=&tag_slugs=politika&type=Video,Articolo&order=pubfromdate-&ret_fields=props.type__AS__type,props.media[indexof(x.type=%27media%27%20for%20x%20in%20props.media)][%27hd-alternate-href%27]__AS__thumb,props.categories[0].name__AS__category,props.href__AS__href,props.title__AS__title,props.commentCount__AS__commentCount,props.media[indexof(x.type=%27media%27%20for%20x%20in%20props.media)].otheralternate.1280x720.href__AS__imgxl,props.media[indexof(x.type=%27media%27%20for%20x%20in%20props.media)].otheralternate.300x200.href__AS__imgm,props.media__AS__media_json&page=1

just page through changing the page parameter at the very end there: page=1 until you reach the end.

How to scrape an infinite load page with R

Question

1 answers

solution1
0 2017-05-23 22:11:40

How to scrape an infinite load page with R

Question

1 answers

solution1 0 2017-05-23 22:11:40

solution1
0 2017-05-23 22:11:40