简体   繁体   中英

NodeJS, MongoDB, Mongoose Save Large Data Non-Blocking

I am currently developing a simple application using NodeJS, ExpressJS (with EJS), MongoDB and Mongoose. Below is a brief of the issue that I am facing and looking for some suggestions

Scenario

1) On a specific event a web service using SOAP is called and data is pulled.

2) The API returns around million rows of data at a time

3) The data that is pulled is saved into MongoDB using mongoose

Code

DB Model - (myModel.js)

var mongoose = require('mongoose')
var Schema = mongoose.Schema

var prodSchema = new Schema({
    rowIndex: {
        type: Number,
    },
    prodId: {
        type: String,
    },
    prodDesc: {
        type: String,
    },
    prodCategory: {
        type: String,
    }
});

module.exports = mongoose.model('Product', prodSchema);

Pulling the data attached to a controller - (app.js)

/**
 * Module dependencies.
 */

/* Express */
var express = require('express');
var http = require('http');
var path = require('path');
var fs = require('fs');
var bcrypt = require('bcrypt-nodejs');
var moment = require('moment');
var os = require('os');

var config = require('./config');

/* Models */
var Product = require('./models/myModel');

var soap = require('soap');

var app = express();

/// Include the express body parser
app.configure(function () {
    app.use(express.bodyParser());
});

/* all environments */
app.engine('.html', require('ejs').__express);

app.set('port', process.env.PORT || 3000);
app.set('views', path.join(__dirname, 'views'));
app.set('view engine', 'html');

app.use(express.favicon());
app.use(express.logger('dev'));
app.use(express.json());
app.use(express.urlencoded());
app.use(express.methodOverride());
app.use(express.cookieParser('your secret here'));
app.use(express.session());
app.use(app.router);
app.use(express.static(path.join(__dirname, 'public')));

/* DB Connect */
mongoose.connect( 'mongodb://localhost:27017/productDB', function(err){
    if ('development' == app.get('env')) {
        if (err) throw err;
        console.log('Successfully connected to MongoDB globally');
    }
} );

/* Close DB gracefully */
var gracefulExit = function() { 
  mongoose.connection.close(function () {
    console.log('Mongoose default connection with DB is disconnected through app termination');
    process.exit(0);
  });
}

// If the Node process ends, close the Mongoose connection
process.on('SIGINT', gracefulExit).on('SIGTERM', gracefulExit);

/* development only */
if ('development' == app.get('env')) {
    app.use(express.errorHandler());
}

/********************************************************/
/***** GET *****/
/********************************************************/

app.get('/getproducts', getProducts);

/* If GET on http://localhost:3000/getproducts the call the below function to get data from web service */
function getProducts(req, res){
    var post = req.body;
    var url = 'http://www.example.com/?wsdl';
    soap.createClient(url, function(err, client) {
        client.setSecurity(new soap.BasicAuthSecurity(User, Pass));
        client.someMethod(function(err, result) {
            var product = result.DATA.item;
            for(var i=0; i<product.length; i++) {
                var saveData = new Product({
                    rowIndex: product.ROW_INDEX,
                    prodId: product.PROD_ID,
                    prodDesc: product.PROD_DESC,
                    prodCategory: product.PROD_CATEGORY,
                 });
                 saveData.save();
            }           
        });
    });   
}

/* Create Server */
http.createServer(app).listen(app.get('port'), function(){    
    console.log('Express server listening on port ' + app.get('port') + ' in ' + app.get('env') + ' mode');
});

Data returned from the web service

[ { ROW_INDEX: '1',
    PROD_ID: 'A1',
    PROD_DESC: 'New product',
    PROD_CATEGORY: 'Clothes' },
  { ROW_INDEX: '2',
    PROD_ID: 'A2',
    PROD_DESC: 'New product 2',
    PROD_CATEGORY: 'Clothes' },
  { ROW_INDEX: '3',
    PROD_ID: 'A3',
    PROD_DESC: 'New product 3',
    PROD_CATEGORY: 'shoes' },
  .
  .
  . millions of rows
]

Problem/ Suggestion Needed

The issues that I am facing is that till the time all the data is saved to the database, the server is blocked and no other functions like rendering pages for concurrent users OR, saving more data are executed.

I am in the process of creating a view which would return the saved data as well. These again will be millions of rows of data - but this time fetched from MongoDB and passed to the view in EJS.

Would appreciate any help/ suggestion for optimizing the performance for running parallel process and computing the large amount of data.

Your save is not asynchronous. This line is blocking:

saveData.save();

Instead, save the model asynchronously (pass a function to run once the save is finished):

function getProducts(req, res){
    var post = req.body;
    var url = 'http://www.example.com/?wsdl';
    soap.createClient(url, function(err, client) {
        client.setSecurity(new soap.BasicAuthSecurity(User, Pass));
        client.someMethod(function(err, result) {
            var product = result.DATA.item;
            for(var i=0; i<product.length; i++) {
                var saveData = new Product({
                    rowIndex: product.ROW_INDEX,
                    prodId: product.PROD_ID,
                    prodDesc: product.PROD_DESC,
                    prodCategory: product.PROD_CATEGORY,
                });
                saveData.save( function (err, data) {
                    // any statements here will run when this row is saved,
                    // but the loop will continue onto the next product without
                    // waiting for the save to finish
                });
            }           
        });
    });   
    res.send("Getting the data! You can browse the site while you wait...");
}

This way, the entire loop will run (virtually) instantly, and the data will get saved as it comes in. Meanwhile your node process is free to serve other web requests.

Sounds like a data replication problem, in that your data is not replicated over several nodes. I suggest examining how your MongoDB is set up. Replication will increase the availability of your service, one node responds to the initial request and leaves others with copies of the same data free to respond to new reads/writes.

If all your reads are million line reads, you may need a few nodes.

A quick google came up with this MongoDB tutorial on Replication. The opening paragraph states "Replication provides redundancy and increases data availability."

http://docs.mongodb.org/manual/core/replication-introduction/

You may want to learn how to use Streams .

  1. Create a writable stream that inserts data in mongodb when its _write() is called with some data. It would be even better to buffer data and insert in small batches.

  2. Create a transform stream that parses chunks of xml (as they arrive) to json objects. You'd be lucky if you can get a ready made stream parser on npm .

  3. Obtain the SOAP response as stream ( a million items, but still, one response stream), and pipe : soap response -> transform -> write. This might take some work.

  4. Listen to finish of last stream and error of all streams, operation ends when a finish or error is emitted for the first time. Dismantle the pipe setup.( unpipe )

  5. Also, Streams can be paused and resumed, so you might want to use this to keep the data flow in check.

The above approach does not increases server's processing speed, just helps in keeping it responsive. Build a cluster of dedicated xml to db streaming services if you want to really scale up.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM