如何从 Javascript 呈现的索引中抓取 onclick 中的链接？

Question

In Apify , I want to write a scraper for content on the following index page - https://www.ampereanalysis.com/insights在Apify中，我想为以下索引页面上的内容写一个爬虫 - https://www.ampereanalysis.com/insights

The page seems to present two challenges:该页面似乎提出了两个挑战：

It looks like the content links are themselves rendered by Javascript看起来内容链接本身是由 Javascript 呈现的

Links to articles are not in standard href anchors.文章链接不在标准的href锚点中。 Here is an actual example...这是一个实际的例子......

 <div class="report-click" style="cursor:pointer;" onclick="window.location = '/insight/quibi-bets-on-daily-shows'"> <h6 class="report-title pt-2">Quibi...

How can I accomplish this in Apify?我怎样才能在 Apify 中做到这一点？

Answer 1

In case you're using a browser based scraper and the content is lodaded dynamically, you can await a specific condition or selector to appear and then parse the paths from onclick attributes and merge them with website hostname to re-construct the full content URLs.如果您使用的是基于浏览器的抓取工具并且内容是动态加载的，您可以等待特定条件或选择器出现，然后解析来自 onclick 属性的路径并将它们与网站主机名合并以重新构建完整的内容 URL。

Example:例子：

document.querySelectorAll('.report-click').getAttribute('onclick').split('= ')[1]

Docs for awaiting selectors in Puppeteer:在 Puppeteer 中等待选择器的文档：

https://pptr.dev/#?product=Puppeteer&version=v3.0.0&show=api-pagewaitforselectorselector-options https://pptr.dev/#?product=Puppeteer&version=v3.0.0&show=api-pagewaitforselectorselector-options

如何从 Javascript 呈现的索引中抓取 onclick 中的链接？

问题描述

1 个解决方案

解决方案1
-1 2020-04-20 11:17:05

如何从 Javascript 呈现的索引中抓取 onclick 中的链接？

问题描述

1 个解决方案

解决方案1 -1 2020-04-20 11:17:05

解决方案1
-1 2020-04-20 11:17:05