[英]Getting source of a page after it's rendered in a templating engine?
So I'm doing some screen scraping on a site that is very JS heavy. 因此,我在JS非常繁重的网站上进行了一些屏幕抓取。 It uses a client side templating engine that renders all the content.
它使用呈现所有内容的客户端模板引擎。 I tried using jQuery and that worked in the console, but not on the server (Nodejs), obviously.
我尝试使用jQuery,并且可以在控制台中使用,但显然不能在服务器(Nodejs)上使用。
I looked at a few libraries for Python and Java, and they seem to be able to handle what I want, but I would prefer a JS solution that works with a Node server. 我查看了一些用于Python和Java的库,它们似乎能够处理我想要的内容,但是我更喜欢与Node服务器配合使用的JS解决方案。
Is there any way to get the complete source of a page after it's rendered, using Node? 有什么方法可以使用Node获取呈现页面后的完整源代码吗?
if you want to use a nodejs module then you might be interested in this: 如果您想使用nodejs模块,那么您可能对此感兴趣:
https://github.com/sgentle/phantomjs-node https://github.com/sgentle/phantomjs-node
or this: 或这个:
https://github.com/alexscheelmeyer/node-phantom https://github.com/alexscheelmeyer/node-phantom
I used jsdom for screen scrapping and the code goes here... 我使用jsdom进行屏幕抓取 ,代码在这里...
var jsdom = require( 'jsdom' );
jsdom.env( {
url: <give_url_of_page_u_want_to_scarpe>,
scripts: [ "http://code.jquery.com/jquery.js" ],
done: function( error, window ) {
var $ = window.$;
// required page is loaded in $....
//you can write any javascript or jquery code get what ever you want
}
} );
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.