简体   繁体   中英

Wait for the end of a function execution in a PhantomJS script

I am working on a little program which is openning a big number of webpage (according to an ID taken from a id.txt) and save it in a file.

var page = require('webpage').create();
var fs = require('fs');
var file_h = fs.open('id.txt', 'r'); // contains data like : myName-1111
var line = file_h.readLine();

while(line) {
    data = line.split("-");
    line = file_h.readLine();
    savePage(data[1]);
}

function savePage(id){
    page.open('http://www.myWebsite.com/'+id, function(){
        page.evaluate();
        fs.write("page/"+id+'.html', page.content, 'w');
    });
}

file_h.close();
phantom.exit();

At the moment, I am saving only this html, head and body tag without any content.

I think it is due to the fact that I am not waiting for the current page to be load correctly and completely.

So I would like to know if there is a solution to wait between each "for" iteration to get the full page and be able to save it?

The problem is that the loop execution is synchronous, but the page.open() call in the savePage function is not. When the loop is executed, the page is not fully loaded, because the next page open is triggered.

You might think that the last page will be fully loaded, but it is not, because you're exiting too early with phantom.exit() .

JavaScript doesn't have a sleep function. Waiting/sleeping is done asynchronously. The only way to solve this is to use recursion.

Move the content of your while loop inside the page.open() call and remove the loop. Then call the function. You will need to move the finish condition also into the page.open() call:

var page = require('webpage').create();
var fs = require('fs');
var file_h = fs.open('id.txt', 'r'); // contains data like : myName-1111

function traverse(){
    var line = file_h.readLine();
    if (!line) {
        file_h.close();
        phantom.exit();
    }
    page.open('http://www.myWebsite.com/'+id, function(){
        var data = line.split("-");
        traverse();
        fs.write("page/"+data[1]+'.html', page.content, 'w');
    });
}

traverse();

I, finally, write a program which is working.

Here is the code :

var page = require('webpage').create();
var fs = require('fs');
var file_h = fs.open('id.txt', 'r');

var line = file_h.readLine();
data = line.split("-");
console.log("Reading id : "+data[1]);
savePage(data[1]);

function savePage(id){
    console.log("\n#### READING http://www.myWebsite.com/"+ id +" ####")
    page.open('http://www.myWebsite.com/'+id, function(){
        page.evaluate(function(){

        });
        console.log("#### WRITING "+ id +".html ####")
        fs.write("page/"+id+'.html', page.content, 'w');

        line = file_h.readLine();
        if(line == ""){
            phantom.exit();
        }
        data = line.split("-");
        savePage(data[1]);
    });
}

Enjoy !

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM