简体   繁体   English

等待PhantomJS脚本中的函数执行结束

[英]Wait for the end of a function execution in a PhantomJS script

I am working on a little program which is openning a big number of webpage (according to an ID taken from a id.txt) and save it in a file. 我正在开发一个小的程序,它将打开大量网页(根据从id.txt中获取的ID)并将其保存在文件中。

var page = require('webpage').create();
var fs = require('fs');
var file_h = fs.open('id.txt', 'r'); // contains data like : myName-1111
var line = file_h.readLine();

while(line) {
    data = line.split("-");
    line = file_h.readLine();
    savePage(data[1]);
}

function savePage(id){
    page.open('http://www.myWebsite.com/'+id, function(){
        page.evaluate();
        fs.write("page/"+id+'.html', page.content, 'w');
    });
}

file_h.close();
phantom.exit();

At the moment, I am saving only this html, head and body tag without any content. 目前,我仅保存此html,head和body标签,而没有任何内容。

I think it is due to the fact that I am not waiting for the current page to be load correctly and completely. 我认为这是由于我没有等待当前页面正确正确地加载而导致的。

So I would like to know if there is a solution to wait between each "for" iteration to get the full page and be able to save it? 所以我想知道是否有解决方案可以在每次“ for”迭代之间等待以获取完整页面并能够保存它?

The problem is that the loop execution is synchronous, but the page.open() call in the savePage function is not. 问题是循环执行是同步的,但savePage函数中的page.open()调用savePage是。 When the loop is executed, the page is not fully loaded, because the next page open is triggered. 执行循环时,页面未完全加载,因为触发了下一个打开的页面。

You might think that the last page will be fully loaded, but it is not, because you're exiting too early with phantom.exit() . 您可能会认为最后一页将被完全加载,但事实并非如此,因为您使用phantom.exit()退出得太早了。

JavaScript doesn't have a sleep function. JavaScript没有睡眠功能。 Waiting/sleeping is done asynchronously. 等待/睡眠是异步完成的。 The only way to solve this is to use recursion. 解决此问题的唯一方法是使用递归。

Move the content of your while loop inside the page.open() call and remove the loop. 将while循环的内容page.open()调用内,然后删除循环。 Then call the function. 然后调用该函数。 You will need to move the finish condition also into the page.open() call: 您还需要将完成条件也移到page.open()调用中:

var page = require('webpage').create();
var fs = require('fs');
var file_h = fs.open('id.txt', 'r'); // contains data like : myName-1111

function traverse(){
    var line = file_h.readLine();
    if (!line) {
        file_h.close();
        phantom.exit();
    }
    page.open('http://www.myWebsite.com/'+id, function(){
        var data = line.split("-");
        traverse();
        fs.write("page/"+data[1]+'.html', page.content, 'w');
    });
}

traverse();

I, finally, write a program which is working. 最后,我编写了一个正在运行的程序。

Here is the code : 这是代码:

var page = require('webpage').create();
var fs = require('fs');
var file_h = fs.open('id.txt', 'r');

var line = file_h.readLine();
data = line.split("-");
console.log("Reading id : "+data[1]);
savePage(data[1]);

function savePage(id){
    console.log("\n#### READING http://www.myWebsite.com/"+ id +" ####")
    page.open('http://www.myWebsite.com/'+id, function(){
        page.evaluate(function(){

        });
        console.log("#### WRITING "+ id +".html ####")
        fs.write("page/"+id+'.html', page.content, 'w');

        line = file_h.readLine();
        if(line == ""){
            phantom.exit();
        }
        data = line.split("-");
        savePage(data[1]);
    });
}

Enjoy ! 请享用 !

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM