简体   繁体   English

在模板引擎中渲染页面后获取页面源?

[英]Getting source of a page after it's rendered in a templating engine?

So I'm doing some screen scraping on a site that is very JS heavy. 因此,我在JS非常繁重的网站上进行了一些屏幕抓取。 It uses a client side templating engine that renders all the content. 它使用呈现所有内容的客户端模板引擎。 I tried using jQuery and that worked in the console, but not on the server (Nodejs), obviously. 我尝试使用jQuery,并且可以在控制台中使用,但显然不能在服务器(Nodejs)上使用。

I looked at a few libraries for Python and Java, and they seem to be able to handle what I want, but I would prefer a JS solution that works with a Node server. 我查看了一些用于Python和Java的库,它们似乎能够处理我想要的内容,但是我更喜欢与Node服务器配合使用的JS解决方案。

Is there any way to get the complete source of a page after it's rendered, using Node? 有什么方法可以使用Node获取呈现页面后的完整源代码吗?

I personally love PhantomJS or Selenium , which do exactly that. 我个人很喜欢PhantomJSSelenium ,它们确实做到了。

The docs/examples should work pretty much out of the box. docs / examples应该开箱即用。

if you want to use a nodejs module then you might be interested in this: 如果您想使用nodejs模块,那么您可能对此感兴趣:

https://github.com/sgentle/phantomjs-node https://github.com/sgentle/phantomjs-node

or this: 或这个:

https://github.com/alexscheelmeyer/node-phantom https://github.com/alexscheelmeyer/node-phantom

I used jsdom for screen scrapping and the code goes here... 我使用jsdom进行屏幕抓取 ,代码在这里...

var jsdom = require( 'jsdom' );
jsdom.env( {
url: <give_url_of_page_u_want_to_scarpe>,
scripts: [ "http://code.jquery.com/jquery.js" ],
done: function( error, window ) {
  var $ = window.$;

  // required page is loaded in $....
  //you can write any javascript or jquery code get what ever you want

}
} );

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM