简体   繁体   English

抓取需要触发javascript的外部网站

[英]scrape external website that requires javascript being triggered

Since phantomjs is abandoned, I would like to know if there is any alternative method. 由于phantomjs被放弃,我想知道是否还有其他方法。 eg chrome-webdriver wouldn't be a good solution as it wouldn't be able to run on a remote host such as heroku. 例如chrome-webdriver并不是一个好的解决方案,因为它无法在诸如heroku之类的远程主机上运行。

So, is it somehow possible to scrape an external website that require javascript being triggered first? 因此,是否有可能以某种方式抓取需要首先触发javascript的外部网站? Note that it should be possible to run it from a nodejs application. 注意,应该可以从nodejs应用程序运行它。

I was getting ready to put together something for you, then I thought better and google'd it. 我已经准备好为您准备一些东西,然后我想到了更好的方法并用谷歌搜索了它。 Check out this build script; 查看此构建脚本; it seems to answer your question exactly. 它似乎完全可以回答您的问题。

https://github.com/stomita/heroku-buildpack-phantomjs https://github.com/stomita/heroku-buildpack-phantomjs

Set up a git branch and pull it locally if you have to, but this should work. 设置一个git分支,如果需要的话,将其拉到本地,但这应该可行。 Basically, you need to download the binary and then remote in and run "heroku run 'phantomjs'" or "heroku run 'bin/phantomjs'" 基本上,您需要下载二进制文件,然后远程下载并运行“ heroku run'phantomjs'”或“ heroku run'bin / phantomjs'”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM