简体   繁体   中英

How to get page HTML before it get changed by Javascript in PhantomJS

I'm trying to get the page pure html, to make a diff after the scripts evaluations. But I'm finding nothing.

I checked in the Web Page Module API page. http://phantomjs.org/api/webpage/

But every event returns me the HTML after being affected by the page scripts, or an empty html structure.

There is no API call for that, but you can easily download the original page source as a separate XHR:

var originalSource = page.evaluate(function(){
    var xhr = new XMLHttpRequest();
    xhr.open("GET", ""+window.location, false);
    xhr.send();
    return xhr.responseText;
});

Though, this will probably not work if the page source depends on the session. A tweaking of request headers might work then. See also Can I get the original page source (vs current DOM) with phantomjs/casperjs?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM