How to scrape from Javascript-rendered index where links are in onclick?

Question

In Apify , I want to write a scraper for content on the following index page - https://www.ampereanalysis.com/insights

The page seems to present two challenges:

It looks like the content links are themselves rendered by Javascript

Links to articles are not in standard href anchors. Here is an actual example...

 <div class="report-click" style="cursor:pointer;" onclick="window.location = '/insight/quibi-bets-on-daily-shows'"> <h6 class="report-title pt-2">Quibi...

How can I accomplish this in Apify?

Answer 1

In case you're using a browser based scraper and the content is lodaded dynamically, you can await a specific condition or selector to appear and then parse the paths from onclick attributes and merge them with website hostname to re-construct the full content URLs.

Example:

document.querySelectorAll('.report-click').getAttribute('onclick').split('= ')[1]

Docs for awaiting selectors in Puppeteer:

https://pptr.dev/#?product=Puppeteer&version=v3.0.0&show=api-pagewaitforselectorselector-options

How to scrape from Javascript-rendered index where links are in onclick?

Question

1 answers

solution1
-1 2020-04-20 11:17:05

How to scrape from Javascript-rendered index where links are in onclick?

Question

1 answers

solution1 -1 2020-04-20 11:17:05

solution1
-1 2020-04-20 11:17:05