简体   繁体   中英

Get html rendered by Javascript using PhantomJS

I am trying to utilize PhantomJS to get html generated by dynamic page. I supposed that this would be easy, but after few hours of trying, I am still not lucky.

The page itself has this source code and what gets saved in 1.html eventually:

<!doctype html>
<html lang="cs" ng-app="appId">
<head ng-controller="MainCtrl">
     (ommited some lines)
    <script src="/js/conf/config.js?pars"></script>
    <script src="/js/all.js?pars"></script>
</head>
<body>
<!--<![endif]-->
    <div site-loader></div>
    <div page-layout>
        <div ng-view></div>
    </div>
</body>
</html>

All content of web gets loaded inside site-loader div, but I have no luck to get it, even though I am using timeout before scraping html by PhantomJS. Here goes code I am using:

var url = 'http:...';
var page = require('webpage').create();
var fs = require('fs');

page.open(url, function (status) {
    if (status !== 'success') {
        console.log('Fail');
        phantom.exit();
    } else {        
        window.setTimeout(function () {
        fs.write('1.html', page.content, 'w');
        phantom.exit();
        }, 2000); // Change timeout as required to allow sufficient time 
    }
});

Please what am I doing wrong?

EDIT: I have decided to try PJscrapper framework and configured it to scrappe all contents of div block. All I got was lousy:

["","\n\t\tif (window.DOT) {\n\t\t\tDOT.cfg({service: 'sreality', impress: false});\n\t\t}\n\t","","Loader.load()","",""]

Seems that I seriously do not get it and always get code before Loader.load() acts. And obviously, timeout does not solve it.

This will do the trick

    page.open(url, function (status) {
    if (status !== 'success') {
        console.log('Unable to load the url!');
        phantom.exit();
    } else {
        window.setTimeout(function () {
            var results = page.evaluate(function() {
                return document.documentElement.innerHTML;
            });
            console.log(results)
            phantom.exit();
        }, 200);
    }
});

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM