简体   繁体   English

使用Apps脚本抓取javascript呈现的网页

[英]Using Apps Script to scrape javascript rendered web page

I am struggling to put a script together to handle the scraping of a javascript rendered web page through Apps Script. 我正在努力将脚本放在一起,以处理通过Apps脚本抓取JavaScript呈现的网页。 Found this How to scrape Javascript rendered websites using Javascript? 找到了这个如何使用Javascript抓取Javascript呈现的网站? here, but I don't know how to put this together. 在这里,但我不知道如何将它们放在一起。 Such as load puppeteer. 如负载p。 Any help would be appreciated. 任何帮助,将不胜感激。

如果您要构建类似于抓取JavaScript生成的内容的内容,建议您遵守使用条款或尝试查找API。

You can try to scrape the initial HTML, since actually scraping the rendered HTML is extremely hard to do, you'd have to use a headless browser. 您可以尝试抓取初始HTML,因为实际上抓取呈现的HTML非常困难,因此您必须使用无头浏览器。

There is this library: https://github.com/tautologistics/node-htmlparser which you can use to parse HTML from JavaScript, it is in node, but because it doesn't use any dependencies, you can just copy and paste the functions you need. 有这个库: https : //github.com/tautologistics/node-htmlparser ,您可以使用它从JavaScript解析HTML,它位于node中,但是由于它不使用任何依赖项,因此您只需复制并粘贴您需要的功能。

Parsing it's not a very easy task I'm afraid. 恐怕这不是一件容易的事。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM