在模板引擎中渲染页面后获取页面源？

Question

So I'm doing some screen scraping on a site that is very JS heavy. 因此，我在JS非常繁重的网站上进行了一些屏幕抓取。 It uses a client side templating engine that renders all the content. 它使用呈现所有内容的客户端模板引擎。 I tried using jQuery and that worked in the console, but not on the server (Nodejs), obviously. 我尝试使用jQuery，并且可以在控制台中使用，但显然不能在服务器（Nodejs）上使用。

I looked at a few libraries for Python and Java, and they seem to be able to handle what I want, but I would prefer a JS solution that works with a Node server. 我查看了一些用于Python和Java的库，它们似乎能够处理我想要的内容，但是我更喜欢与Node服务器配合使用的JS解决方案。

Is there any way to get the complete source of a page after it's rendered, using Node? 有什么方法可以使用Node获取呈现页面后的完整源代码吗？

Answer 1

I personally love PhantomJS or Selenium , which do exactly that. 我个人很喜欢PhantomJS或Selenium ，它们确实做到了。

The docs/examples should work pretty much out of the box. docs / examples应该开箱即用。

Answer 2

if you want to use a nodejs module then you might be interested in this: 如果您想使用nodejs模块，那么您可能对此感兴趣：

https://github.com/sgentle/phantomjs-node https://github.com/sgentle/phantomjs-node

or this: 或这个：

https://github.com/alexscheelmeyer/node-phantom https://github.com/alexscheelmeyer/node-phantom

Answer 3

I used jsdom for screen scrapping and the code goes here... 我使用jsdom进行屏幕抓取，代码在这里...

var jsdom = require( 'jsdom' );
jsdom.env( {
url: <give_url_of_page_u_want_to_scarpe>,
scripts: [ "http://code.jquery.com/jquery.js" ],
done: function( error, window ) {
  var $ = window.$;

  // required page is loaded in $....
  //you can write any javascript or jquery code get what ever you want

}
} );

在模板引擎中渲染页面后获取页面源？

问题描述

3 个解决方案

解决方案1
2 2014-06-08 18:34:30

解决方案2
1 2014-06-08 18:37:28

解决方案3
1 已采纳 2014-06-08 18:41:35

在模板引擎中渲染页面后获取页面源？

问题描述

3 个解决方案

解决方案1 2 2014-06-08 18:34:30

解决方案2 1 2014-06-08 18:37:28

解决方案3 1 已采纳 2014-06-08 18:41:35

解决方案1
2 2014-06-08 18:34:30

解决方案2
1 2014-06-08 18:37:28

解决方案3
1 已采纳 2014-06-08 18:41:35