使用Phantom的网页HTML

Question

I am trying to use PhantomJS to load a page (that uses Javascript to load items on the webpage) and returns all the HTML on the page (at least within the <body /> tags) to the PHP function that executes phantomjs httpget.js . 我正在尝试使用PhantomJS加载页面（使用Javascript加载网页上的项目）并将页面上的所有HTML（至少在<body />标记内）返回给执行phantomjs httpget.js的PHP函数。。

Problem: I can get phantomjs to return the document.title , but asking it to console.log(document.body) simple gives me a [object Object] . 问题：我可以让phantomjs返回document.title ，但是让它返回console.log(document.body)简单会给我一个[object Object] 。 How can I extract the page's HTML? 如何提取页面的HTML？

It also takes much longer to load the webpage using phantomjs compared to the browser . 与浏览器相比，使用phantomjs加载网页还需要更长的时间 。

httpget.js httpget.js

console.log('hello!');
var page = require('webpage').create();
page.open("http://www.asos.com/Men/T-Shirts-Vests/Cat/pgecategory.aspx?cid=7616#parentID=-1&pge=0&pgeSize=900&sort=1",
    function(status){
        console.log('Page title is ' + page.evaluate(function () {
            return document.body;
        }));
        phantom.exit();
    });

Output (running from shell) 输出 （从外壳运行）

hello!
Page title is [object Object]

Answer 1

document.body.innerHTML包含正文的HTML。

Answer 2

Not sure what this has to do with Node.js as you appear to be using PhantomJS directly, not node (or phantom via node-phantom)... 由于您似乎直接使用PhantomJS，而不是节点（或通过node-phantom进行幻像），因此不确定与Node.js有什么关系...

But to answer your question, you need to do this: 但是要回答您的问题，您需要这样做：

var html = page.evaluate(function () {
    var root = document.getElementsByTagName("html")[0];
    var html = root ? root.outerHTML : document.body.innerHTML;
    return html
});

This works with pages that don't have an outer <html> tag. 这适用于没有外部<html>标记的页面。

Answer 3

阅读文档page.content可以获取整个HTML。

使用Phantom的网页HTML

问题描述

3 个解决方案

解决方案1
2 已采纳 2012-08-20 00:58:56

解决方案2
2 2012-08-20 20:15:33

解决方案3
0 2012-08-21 02:37:53

使用Phantom的网页HTML

问题描述

3 个解决方案

解决方案1 2 已采纳 2012-08-20 00:58:56

解决方案2 2 2012-08-20 20:15:33

解决方案3 0 2012-08-21 02:37:53

解决方案1
2 已采纳 2012-08-20 00:58:56

解决方案2
2 2012-08-20 20:15:33

解决方案3
0 2012-08-21 02:37:53