简体   繁体   中英

scrape external website that requires javascript being triggered

Since phantomjs is abandoned, I would like to know if there is any alternative method. eg chrome-webdriver wouldn't be a good solution as it wouldn't be able to run on a remote host such as heroku.

So, is it somehow possible to scrape an external website that require javascript being triggered first? Note that it should be possible to run it from a nodejs application.

I was getting ready to put together something for you, then I thought better and google'd it. Check out this build script; it seems to answer your question exactly.

https://github.com/stomita/heroku-buildpack-phantomjs

Set up a git branch and pull it locally if you have to, but this should work. Basically, you need to download the binary and then remote in and run "heroku run 'phantomjs'" or "heroku run 'bin/phantomjs'"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM