简体   繁体   中英

Efficient multi-document upsert in mongo

I have a node.js app that updates a local mongo (3.0) data store from remote API, and I'm trying to make it as efficient as possible.

Every record in the collection has a unique remoteId property. After calling the API I get a set of records. Then I should update the local documents with new properties where ones with matching remoteId already exist, do inserts where they don't, and mark documents that exist locally but not in the remote data set as inactive.

My current solution is this (mongoose code, stripped out callbacks / promises for clarity, assume it runs synchronously):

timestamp = new Date
for item in remoteData
  collection.findOneAndUpdate { remoteId: item.remoteId }, { updatedAt: timestamp, /* other properties */ }, { upsert: true }
collection.update { updatedAt: { $lt: timestamp} }, { active: false }, { multi: true }

Seems straightforward enough. But when dealing with tens of thousands of documents, it gets quite slow.

I looked at Bulk.upsert from mongo documentation, but that seems to work only when your document finding queries are static.

What could I do here?

Turns out I didn't fully grasp the mongo Bulk api - I had missed that it's basically an array of commands that gets sent to database when you call execute . In the end, this is what I had to do:

timestamp = new Date
bulkOp = collection.initializeUnorderedBulkOp()
for item in remoteData
  bulkOp.find({ remoteId: item.remoteId }).upsert().updateOne { updatedAt: timestamp, /* other properties */ }
bulkOp.execute()
collection.update { updatedAt: { $lt: timestamp} }, { active: false }, { multi: true }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM