简体   繁体   中英

node.js parallel execution

I am trying to learn parallel execution in node.js. I wrote below sample code. However, the output is serial. First 0..99 gets printed and then 100..200.

I understand this is because node.js is inherently single threaded and inside the loop, the thread is captured by the for loop.

What I am trying to understand is in what cases this flow.parallel structure is useful? Any request to I/O or database will anyways will be asynchronous in node.js. Then why do we need flow.parallel ?

var flow = require('nimble');


flow.parallel([

    function a(callback)
    {
        for(var i=0;i<100;++i)
        {
            console.log(i);

        }
            callback();
    },
    function b(callback)
    {

        for (var i=100;i<200;++i)
        {
            console.log(i);

        }
        callback();
    }
    ]);

In most cases using a parallel flow such as this, you wont be printing a bunch of numbers in a for-loop (which so happens to be blocking execution). When you register your functions, they are registered in the same order in which you defined them in that array your passing to parallel . In the case above, function a first and function b second. Consequently, Node's event loop will call upon a() first then b() at an undisclosed time later. Because we know those for-loops are blocking, and node runs in a single thread, it must complete the entire for-loop within a() and finally return before Node's event loop gets to take control over it again, where b() is waiting in the queue to be processes similarly.

Why is a parallel flow-control construct useful? By design, you're not suppose to do blocking operations within node (see your example). a() consumes the entire thread, then b() will consume the entire thread before anything else gets to happen.

a()  b()
 |
 |
 |
 |
RET
     |
     |
     |
     |
    RET

Now, say you are making a web application where a user may register and, at the same time, upload a picture. Your user registration might have code like this:

var newUser = {
  username: 'bob',
  password: '...', 
  email: 'bob@example.com',
  picture: '20140806-210743.jpg'
}

var file = path.join(img.IMG_STORE_DIR, newUser.picture);

flow.parallel([
  function processImage(callback) {
    img.process(function (err) {
      if (err) return callback(err); 

      img.save(file, function (err) {
        return callback(err); // err should be falsey if everything was good
      })
    });
  },
  function dbInsert(callback) {
    db.doQuery('insert', newUser, function (err, id) {
      return callback(err);
    });
  }
], function () {
  // send the results to the user now to let them know they are all registered! 
});

The inner functions here are non-blocking, and both call upon processing or network laden operations. They, however, are fairly independent of each other. You don't need one to finish for the other to begin. Within the functions we can't see the code for, they are using more async calls with function callbacks, each one enqueing another item for Node to process. Node will attempt to clear out the queue, evenly distributing the workload among CPU cycles.

We hope that something like this is now happening:

a = processImage
b = dbInsert
a()  b()
 |
      |
 |
      |
 |   
      |
 |
RET   |
     RET

If we had them in series, ie you must wait for the image to fully be processed prior to the db insert, you have to do a lot of waiting. If IO is really high on your system, node will be twiddling its thumbs waiting on the OS. By contrast, using parallel will allow slow operations to yield to faster ones, theoretically.

If Node does this by itself, why do we really need it? The key is in the 2nd argument that you've omitted.

nimble.parallel([a,b], function () {
  // both functions have now returned and called-back. 
}); 

You can now see when both tasks are done, node does not do this by default, so it can be a fairly useful thing to have.

The flow.parallel gives you reusable logic for determining when all the parallel operations have completed. Yes, if you just did db.query('one');db.query('two');db.query('three'); , they would all execute in parallel by the nature of async, but you'd have to write some boilerplate code to keep track of when they were all done and if any had encountered an error. It's that part that flow.parallel (or it's counterpart in any flow control library) provides.

parallel execution in Node.js

Reading file directory in parellel execution Using Nodejs

create dir

mkdir Demo

create files

demo.txt,demo2.txt,demo3.txt

each file having some contains or paragraph

create file word_count.js

 var fs = require('fs'); var completedTasks = 0; var tasks = []; var wordCounts = {}; var filesDir = './test'; function checkIfComplete() { completedTasks++; if(completedTasks == tasks.length){ for (var index in wordCounts){ console.log(index +': ' + wordCounts[index]); } } } function countWordsInText(text) { var words = text .toString() .toLowerCase() .split(/\\W+/) .sort(); for (var index in words) { var word = words[index]; if(word) { wordCounts[word] = (wordCounts[word]) ? wordCounts[word] + 1 : 1; } } } fs.readdir(filesDir, function(err, files){ if(err) throw err; for (var index in files) { var task =(function (file) { return function() { fs.readFile(file, function(err, text) { if(err) throw err; countsInText(text); checkIfComplete(); }); } })(filesDir + '/' + files[index]); tasks.push(task); } for (var task in tasks) { tasks[task] (); } }); 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM