简体   繁体   中英

Schedule PDF generation with phantom.js and node.js

I am new to node.js and phantom.js, so I'm not sure how to utilize them better than what I have done below.

I have dress pricelist of 100s of schools that can be downloaded as PDF from a respective school page. What we do is generate PDF and upload to server overnight.

Now we want to use node.js and phantom.js to generate PDFs in bulk and automate the process as much as we can.

Links below are not pricelist pages but sample URLs to test PDF.

```

var schedule = require('node-schedule'),
    path = require('path'),
    childProcess = require('child_process'),
    phantomjs = require('phantomjs'),
    binPath = phantomjs.path,
    childArgs = [
        // phantomjs rasterize.js http://codefight.org codefight.pdf
        path.join(__dirname, 'rasterize.js'),
            'http://codefight.org/',
            'codefight.pdf',
            '400*300'
        ]

// add all the URLs and name of PDF here
var pdfSources = [
            ['codefight.pdf', 'http://codefight.org/'],
            ['dltr.pdf', 'http://dltr.org/']
        ];

// schedule generating PDFs
// running every minute for now to test
var j = schedule.scheduleJob('* * * * *', function(){

  // loop through the pdfSources and generate new PDFs
  pdfSources.forEach(function(item, index){

    // update childArgs
    childArgs[1] = item[1]; // pdf content source url
    childArgs[2] = item[0]; // pdf filename

    childProcess.execFile(binPath, childArgs, function(err, stdout, stderr) {
        // for some reason childArgs[2] always prints last item of pdfSources
        // not sure how to make it work :(
        console.log('New PDF - "' + childArgs[2] + '" generated!');
        console.log(err + stdout + stderr);
    });
  });
});

```

1. I would like to know is why console.log('New PDF - "' + childArgs[2] + '" generated!'); always prints the same output. ie "New PDF - "dltr.pdf" generated!"

2. Is there a better way to achieve the same thing with node.js & phantom.js and any improvements you would like to suggest?

Thank You!

Answer 1. The output is the same due to async nature of execFile . So basicly in forEach loop you assign value to childArgs[2] and call execFile but it's callback is put in a queue then in second loop you overwrite childArgs[2] and call execFile again. Now it's time for callbacks to run but the thing is the childArgs[2] has last value you assign to it. The workaround could be to put execFile in a closure like bellow

(function(cArgs){

      childProcess.execFile(binPath, cArgs, function(err, stdout, stderr) { 
          console.log('New PDF - "' + cArgs[2] + '" generated!');       
          console.log(err + stdout + stderr); 
      });

})(childArgs);

I have nothing to add to question 2.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM