简体   繁体   English

使用 mongoose 在 MongoDB 中批量更新插入

[英]Bulk upsert in MongoDB using mongoose

Is there any option to perform bulk upserts with mongoose?是否有任何选项可以使用猫鼬执行批量更新插入? So basically having an array and insert each element if it not exists or update it if it exists?所以基本上有一个数组并在每个元素不存在时插入它,如果存在则更新它? (I am using customs _ids) (我正在使用海关 _ids)

When I do use .insert MongoDB returns an error E11000 for duplicate keys (which should be updated).当我确实使用.insert MongoDB 时,会为重复键(应该更新)返回错误 E11000。 Inserting multiple new document works fine though:插入多个新文档虽然工作正常:

var Users = self.db.collection('Users');

Users.insert(data, function(err){
            if (err) {
                callback(err);
            }
            else {
                callback(null);
            }
        });

Using .save returns an error that the parameter must be a single document:使用.save 会返回参数必须是单个文档的错误:

Users.save(data, function(err){
   ...
}

This answer suggest there is no such option, however it is specific for C# and also already 3 years old. 这个答案表明没有这样的选项,但是它是特定于 C# 的,并且已经有 3 年历史了。 So I was wondering if there is any option to do that using mongoose?所以我想知道是否有任何选项可以使用猫鼬来做到这一点?

Thank you!谢谢!

Not in "mongoose" specifically, or at least not yet as of writing.具体不是“猫鼬”,或者至少在写作时还没有。 The MongoDB shell as of the 2.6 release actually uses the "Bulk operations API" "under the hood" as it were for all of the general helper methods.从 2.6 版开始,MongoDB shell 实际上在“幕后”中使用了“批量操作 API ”,因为它用于所有通用辅助方法。 In it's implementation, it tries to do this first, and if an older version server is detected then there is a "fallback" to the legacy implementation.在它的实现中,它首先尝试执行此操作,如果检测到旧版本服务器,则会“回退”到旧版实现。

All of the mongoose methods "currently" use the "legacy" implementation or the write concern response and the basic legacy methods.所有 mongoose 方法“当前”都使用“遗留”实现或写关注响应和基本遗留方法。 But there is a .collection accessor from any given mongoose model that essentially accesses the "collection object" from the underlying "node native driver" on which mongoose is implemented itself:但是任何给定的.collection模型都有一个.collection访问器,它本质上是从底层的“节点本机驱动程序”访问“集合对象”,在该驱动程序上实现了.collection本身:

 var mongoose = require('mongoose'),
     Schema = mongoose.Schema;

 mongoose.connect('mongodb://localhost/test');

 var sampleSchema  = new Schema({},{ "strict": false });

 var Sample = mongoose.model( "Sample", sampleSchema, "sample" );

 mongoose.connection.on("open", function(err,conn) { 

    var bulk = Sample.collection.initializeOrderedBulkOp();
    var counter = 0;

    // representing a long loop
    for ( var x = 0; x < 100000; x++ ) {

        bulk.find(/* some search */).upsert().updateOne(
            /* update conditions */
        });
        counter++;

        if ( counter % 1000 == 0 )
            bulk.execute(function(err,result) {             
                bulk = Sample.collection.initializeOrderedBulkOp();
            });
    }

    if ( counter % 1000 != 0 )
        bulk.execute(function(err,result) {
           // maybe do something with result
        });

 });

The main catch there being that "mongoose methods" are actually aware that a connection may not actually be made yet and "queue" until this is complete.主要的问题是“猫鼬方法”实际上意识到可能实际上还没有建立连接并在完成之前“排队”。 The native driver you are "digging into" does not make this distinction.您正在“深入研究”的本机驱动程序并没有做出这种区分。

So you really have to be aware that the connection is established in some way or form.所以你真的必须意识到连接是以某种方式或形式建立的。 But you can use the native driver methods as long as you are careful with what you are doing.但是您可以使用本机驱动程序方法,只要您小心自己在做什么。

You don't need to manage limit (1000) as @neil-lunn suggested.您不需要像@neil-lunn 建议的那样管理限制(1000)。 Mongoose does this already.猫鼬已经这样做了。 I used his great answer as a basis for this complete Promise-based implementation & example:我使用他的出色回答作为这个完整的基于 Promise 的实现和示例的基础:

var Promise = require('bluebird');
var mongoose = require('mongoose');

var Show = mongoose.model('Show', {
  "id": Number,
  "title": String,
  "provider":  {'type':String, 'default':'eztv'}
});

/**
 * Atomic connect Promise - not sure if I need this, might be in mongoose already..
 * @return {Priomise}
 */
function connect(uri, options){
  return new Promise(function(resolve, reject){
    mongoose.connect(uri, options, function(err){
      if (err) return reject(err);
      resolve(mongoose.connection);
    });
  });
}

/**
 * Bulk-upsert an array of records
 * @param  {Array}    records  List of records to update
 * @param  {Model}    Model    Mongoose model to update
 * @param  {Object}   match    Database field to match
 * @return {Promise}  always resolves a BulkWriteResult
 */
function save(records, Model, match){
  match = match || 'id';
  return new Promise(function(resolve, reject){
    var bulk = Model.collection.initializeUnorderedBulkOp();
    records.forEach(function(record){
      var query = {};
      query[match] = record[match];
      bulk.find(query).upsert().updateOne( record );
    });
    bulk.execute(function(err, bulkres){
        if (err) return reject(err);
        resolve(bulkres);
    });
  });
}

/**
 * Map function for EZTV-to-Show
 * @param  {Object} show EZTV show
 * @return {Object}      Mongoose Show object
 */
function mapEZ(show){
  return {
    title: show.title,
    id: Number(show.id),
    provider: 'eztv'
  };
}

// if you are  not using EZTV, put shows in here
var shows = []; // giant array of {id: X, title: "X"}

// var eztv = require('eztv');
// eztv.getShows({}, function(err, shows){
//   if(err) return console.log('EZ Error:', err);

//   var shows = shows.map(mapEZ);
  console.log('found', shows.length, 'shows.');
  connect('mongodb://localhost/tv', {}).then(function(db){
    save(shows, Show).then(function(bulkRes){
      console.log('Bulk complete.', bulkRes);
      db.close();
    }, function(err){
        console.log('Bulk Error:', err);
        db.close();
    });
  }, function(err){
    console.log('DB Error:', err);
  });

// });

This has the bonus of closing the connection when it's done, displaying any errors if you care, but ignoring them if not (error callbacks in Promises are optional.) It's also very fast.这样做的好处是在连接完成后关闭连接,如果您关心,则显示任何错误,如果不关心,则忽略它们(Promise 中的错误回调是可选的。)它也非常快。 Just leaving this here to share my findings.只是把这个留在这里分享我的发现。 You can uncomment the eztv stuff if you want to save all eztv shows to a database, as an example.例如,如果要将所有 eztv 节目保存到数据库中,您可以取消注释 eztv 内容。

await Model.bulkWrite(docs.map(doc => ({
    updateOne: {
        filter: {id: doc.id},
        update: doc,
        upsert: true
    }
})))


Or more verbose:或者更详细:

const bulkOps = docs.map(doc => ({
    updateOne: {
        filter: {id: doc.id},
        update: doc,
        upsert: true
    }
}))

Model.bulkWrite(bulkOps)
        .then(bulkWriteOpResult => console.log('BULK update OK:', bulkWriteOpResult))
        .catch(err => console.error('BULK update error:', err))

https://stackoverflow.com/a/60330161/5318303 https://stackoverflow.com/a/60330161/5318303

I have released a plugin for Mongoose that exposes a static upsertMany method to perform bulk upsert operations with a promise interface.我已经为 Mongoose 发布了一个插件,它公开了一个静态upsertMany方法来执行带有承诺接口的批量 upsert 操作。

An added benefit of using this plugin over initializing your own bulk op on the underlying collection, is that this plugin converts your data to Mongoose model's first, and then back to plain objects before the upsert.使用这个插件而不是在底层集合上初始化你自己的批量操作的另一个好处是,这个插件首先将你的数据转换为 Mongoose 模型的数据,然后在 upsert 之前转换回普通对象。 This ensures Mongoose schema validation is applied, and data is depopulated and fit for raw insertion.这可确保应用 Mongoose 模式验证,并减少数据填充并适合原始插入。

https://github.com/meanie/mongoose-upsert-many https://www.npmjs.com/package/@meanie/mongoose-upsert-many https://github.com/meanie/mongoose-upsert-many https://www.npmjs.com/package/@meanie/mongoose-upsert-many

Hope it helps!希望能帮助到你!

If you're not seeing the bulk methods in your db.collection ie you're getting a error to the effect of xxx variable has no method: initializeOrderedBulkOp()如果您没有在 db.collection 中看到批量方法,即您收到 xxx 变量没有方法的影响的错误: initializeOrderedBulkOp()

Try updating your mongoose version.尝试更新您的猫鼬版本。 Apparently older mongoose versions don't pass through all of the underlying mongo db.collection methods.显然,较旧的 mongoose 版本不会通过所有底层 mongo db.collection 方法。

npm install mongoose npm 安装猫鼬

took care of it for me.为我照顾它。

I had to achieve this recently while storing products in my ecommerce app.我最近必须在我的电子商务应用程序中存储产品时实现这一目标。 My database used to timeout as I had to upsert 10000 items every 4 hours.我的数据库曾经超时,因为我必须每 4 小时更新 10000 个项目。 One option for me was to set the socketTimeoutMS and connectTimeoutMS in mongoose while connecting to the database but it sorta felt hacky and I did not want to manipulate connection timeout defaults of the database.对我来说,一种选择是在连接到数据库时在 mongoose 中设置 socketTimeoutMS 和 connectTimeoutMS,但它有点感觉很hacky,我不想操纵数据库的连接超时默认值。 I also see that the solution by @neil lunn takes a simple sync approach of taking a modulus inside the for loop.我还看到@neil lunn 的解决方案采用了一种简单的同步方法,即在 for 循环内取模数。 Here is an async version of mine that I believe does the job much better这是我的一个异步版本,我相信它做得更好

let BATCH_SIZE = 500
Array.prototype.chunk = function (groupsize) {
    var sets = [];
    var chunks = this.length / groupsize;

    for (var i = 0, j = 0; i < chunks; i++ , j += groupsize) {
        sets[i] = this.slice(j, j + groupsize);
    }

    return sets;
}

function upsertDiscountedProducts(products) {

    //Take the input array of products and divide it into chunks of BATCH_SIZE

    let chunks = products.chunk(BATCH_SIZE), current = 0

    console.log('Number of chunks ', chunks.length)

    let bulk = models.Product.collection.initializeUnorderedBulkOp();

    //Get the current time as timestamp
    let timestamp = new Date(),

        //Keep track of the number of items being looped
        pendingCount = 0,
        inserted = 0,
        upserted = 0,
        matched = 0,
        modified = 0,
        removed = 0,

        //If atleast one upsert was performed
        upsertHappened = false;

    //Call the load function to get started
    load()
    function load() {

        //If we have a chunk to process
        if (current < chunks.length) {
            console.log('Current value ', current)

            for (let i = 0; i < chunks[current].length; i++) {
                //For each item set the updated timestamp to the current time
                let item = chunks[current][i]

                //Set the updated timestamp on each item
                item.updatedAt = timestamp;

                bulk.find({ _id: item._id })
                    .upsert()
                    .updateOne({
                        "$set": item,

                        //If the item is being newly inserted, set a created timestamp on it
                        "$setOnInsert": {
                            "createdAt": timestamp
                        }
                    })
            }

            //Execute the bulk operation for the current chunk
            bulk.execute((error, result) => {
                if (error) {
                    console.error('Error while inserting products' + JSON.stringify(error))
                    next()
                }
                else {

                    //Atleast one upsert has happened
                    upsertHappened = true;
                    inserted += result.nInserted
                    upserted += result.nUpserted
                    matched += result.nMatched
                    modified += result.nModified
                    removed += result.nRemoved

                    //Move to the next chunk
                    next()
                }
            })



        }
        else {
            console.log("Calling finish")
            finish()
        }

    }

    function next() {
        current++;

        //Reassign bulk to a new object and call load once again on the new object after incrementing chunk
        bulk = models.Product.collection.initializeUnorderedBulkOp();
        setTimeout(load, 0)
    }

    function finish() {

        console.log('Inserted ', inserted + ' Upserted ', upserted, ' Matched ', matched, ' Modified ', modified, ' Removed ', removed)

        //If atleast one chunk was inserted, remove all items with a 0% discount or not updated in the latest upsert
        if (upsertHappened) {
            console.log("Calling remove")
            remove()
        }


    }

    /**
     * Remove all the items that were not updated in the recent upsert or those items with a discount of 0
     */
    function remove() {

        models.Product.remove(
            {
                "$or":
                [{
                    "updatedAt": { "$lt": timestamp }
                },
                {
                    "discount": { "$eq": 0 }
                }]
            }, (error, obj) => {
                if (error) {
                    console.log('Error while removing', JSON.stringify(error))
                }
                else {
                    if (obj.result.n === 0) {
                        console.log('Nothing was removed')
                    } else {
                        console.log('Removed ' + obj.result.n + ' documents')
                    }
                }
            }
        )
    }
}

You can use mongoose's Model.bulkWrite()您可以使用猫鼬的 Model.bulkWrite()

const res = await Character.bulkWrite([
  {
    updateOne: {
      filter: { name: 'Will Riker' },
      update: { age: 29 },
      upsert: true
    }
  },
  {
    updateOne: {
      filter: { name: 'Geordi La Forge' },
      update: { age: 29 },
      upsert: true
    }
  }
]);

reference : https://masteringjs.io/tutorials/mongoose/upsert参考: https : //masteringjs.io/tutorials/mongoose/upsert

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM