简体   繁体   中英

node.js process a big collection of data

I'm working with mongoose in node. I'm doing requests to retrieve a collection of items from a remote database. In order to get a full report, I need to parse a whole collection which is a large set.

I avoid to get close to things like:

model.find({}, function(err, data) {
  // process the bunch of data
})

For now, I use a recursive approach in which I feed a local variable. Later I send back information about the process as a response.

app.get('/process/it/',(req,res)=>{

  var processed_data=[];

  function resolve(procdata) {
    res.json({status:"ok", items:procdata.length});
  }

  function handler(data, procdata, start, n) { 
    if(data.length <= n)    
      resolve(procdata);
    else {
      // do something with data: push into processed_data
      procdata.push(whatever);

      mongoose.model('model').find({}, function(err, data){     
        handler(data, procdata, start+n, n);    
      }).skip(start).limit(n);
    }
  }

  n=0
  mysize=100

  // first call
  mongoose.model('model').find({}, function(err, data){ 
    handler(data, processed_data, n, mysize);

  }).skip(n).limit(mysize);

})

Is there any approach or solution providing any performance advantage, or just, to achieve this in a better way?

Any help would be appreciated.

Solution depends on the use case.

If data once processed doesn't change often, you can maybe have a secondary database which has the processed data.

You can load unprocessed data from the primary database using pagination the way your doing right now. And all processed data can be loaded from the secondary database in a single query.

It is fine as long as your data set is not big enough, performance could possibly be low though. When it gets to gigabyte level, your application will simply break because the machine won't have enough memory to store your data before sending it to client. Also sending gigabytes of report data will take a lot of time too. Here some suggestions:

  • Try aggregating your data by Mongo aggregate framework, instead of doing that by your application code
  • Try to break the report data into smaller reports
  • Pre-generating report data, store it somewhere (another collection perhaps), and simply send to client when they need to see it

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM