I have a node.js app that updates a local mongo (3.0) data store from remote API, and I'm trying to make it as efficient as possible.
Every record in the collection has a unique remoteId
property. After calling the API I get a set of records. Then I should update the local documents with new properties where ones with matching remoteId
already exist, do inserts where they don't, and mark documents that exist locally but not in the remote data set as inactive.
My current solution is this (mongoose code, stripped out callbacks / promises for clarity, assume it runs synchronously):
timestamp = new Date
for item in remoteData
collection.findOneAndUpdate { remoteId: item.remoteId }, { updatedAt: timestamp, /* other properties */ }, { upsert: true }
collection.update { updatedAt: { $lt: timestamp} }, { active: false }, { multi: true }
Seems straightforward enough. But when dealing with tens of thousands of documents, it gets quite slow.
I looked at Bulk.upsert from mongo documentation, but that seems to work only when your document finding queries are static.
What could I do here?
Turns out I didn't fully grasp the mongo Bulk api - I had missed that it's basically an array of commands that gets sent to database when you call execute
. In the end, this is what I had to do:
timestamp = new Date
bulkOp = collection.initializeUnorderedBulkOp()
for item in remoteData
bulkOp.find({ remoteId: item.remoteId }).upsert().updateOne { updatedAt: timestamp, /* other properties */ }
bulkOp.execute()
collection.update { updatedAt: { $lt: timestamp} }, { active: false }, { multi: true }
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.