简体   繁体   English

用真正的浏览器打开phantomjs标签

[英]open tabs with phantomjs like real browser

PhantomJS is a headless WebKit browser. PhantomJS是一款无头WebKit浏览器。 I can open a url with this and get content of a page that updates every second. 我可以用这个打开一个网址,并获取每秒更新一次的网页内容。

But I need to get the content of many (100) pages at the same time. 但我需要同时获取许多(100)页面的内容。

All pages must be opened concurrently and refresh every second. 所有页面必须同时打开并每秒刷新一次。

It's possible for one page, but I don't know how to retrieve from multiple pages at once. 这可能是一个页面,但我不知道如何从多个页面一次检索。

This is the example code from the PhantomJS website : 这是PhantomJS网站示例代码

console.log('Loading a web page');
var page = require('webpage').create();
var url = 'http://www.phantomjs.org/';
page.open(url, function (status) {
  //Page is loaded!
  phantom.exit();
});

May I use many PhantomJS instances at one time ? 我可以一次使用很多PhantomJS实例吗? I doesn't seem the best way. 我似乎不是最好的方式。 Does any body know how to open just one PhantomJS instance and get content from several pages? 有没有人知道如何打开一个PhantomJS实例并从多个页面获取内容?

Here is the code, I used before to parse the items for the E-shop and putting HTML code for each page of these items 这是我之前用来解析电子商店的项目并为这些项目的每个页面放置HTML代码的代码

I hope that it will help you! 我希望它会对你有所帮助!

var RenderUrlsToFile, system, url_string_for_array;
var arrayOfUrls = new Array();

system = require("system");

RenderUrlsToFile = function(urls, callbackPerUrl, callbackFinal) {
var getFilename, next, page, retrieve, urlIndex, webpage, link_name, sex;

var fs = {};
fs = require('fs');

urlIndex = 0;
webpage = require("webpage");
page = null;
// getFilename = function() {
//     return "parsed/" + urlIndex + ".png";
// };
next = function(status, url, file) {
    page.close();
    callbackPerUrl(status, url, file);
    return retrieve();
};
retrieve = function() {
    var url;
    if (urls.length > 0) {
        url = urls.shift();
        urlIndex++;
        page = webpage.create();
        page.viewportSize = {
            width: 800,
            height: 600
        };
        page.settings.userAgent = "Phantom.js bot";
        return page.open("http://" + url, function(status) {
            var file;
            // file = getFilename();
            if (status === "success") {
                return window.setTimeout((function() {
                    // page.render(file);

                    var js = page.evaluate(function () {
                            return document;
                        });

                    fs.write('your_file_path'.html', js.all[0].outerHTML, 'w');

                    return next(status, url, file);
                }), 100);
            } else {
                return next(status, url, file);
            }
        });

    } else {
        return callbackFinal();
    }
};
return retrieve();
};

if (system.args.length > 1) {
arrayOfUrls = Array.prototype.slice.call(system.args, 1);
} else {

------------MAIN PART OF CODE FOR YOUR QUESTION------ ------------您的问题的主要代码------

For example: I need to parse the items on the E-shop, so I take the first page and then I set "for" for the exactly numbe of pages 例如:我需要解析电子商店上的项目,所以我拿第一页然后我为正好数量的页面设置“for”

url_string_for_array = "www.lamoda.ru/c/559/accs-muzhskieaksessuary/?genders=men&page=1";

for(var k=2; k<20; k++)
    {
        url_string_for_array += ",www.lamoda.ru/c/559/accs-muzhskieaksessuary/?genders=men&page="+k;
    }

arrayOfUrls = url_string_for_array.split(',');
}

RenderUrlsToFile(arrayOfUrls, (function(status, url, file) {
if (status !== "success") {
    return console.log("Unable to render '" + url + "'");
} else {
    return console.log("Rendered '" + url + "'");
}
}), function() {
return phantom.exit();
});

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM