I have a MongoDB collection like:
[{'_id': abc, 'Sex': 'f'}, {'_id': bcd, 'Sex': 'm'}, {'_id': cde, 'Sex': 'm'}, {'_id': def, 'Sex': 'm'}]
I also have a Python list of dictionaries like:
[{'_id': abc, 'Age': 70}, {'_id': bcd, 'Age': 51}, {'_id': cde}, {'_id': def, 'Age': 'unknown'}]
I need to match by _id
on a large collection, updating each document, eg, as below:
[{'_id': abc, 'Sex': 'f', 'Age': 70}, {'_id': bcd, 'Sex': 'm', 'Age': 51}, {'_id': cde, 'Sex': 'm'}, {'_id': def, 'Sex': 'm', 'Age': 'unknown'}]
Is there a way to do this efficiently for a large collection? (Not just iterating through the list of dictionaries and using update_one
on each document.)
Is there a way to do this efficiently for a large collection?
You can perform Bulk Write Operations instead of sending a single update operation per document.
If you are using PyMongo then it will automatically split the batch update operations into smaller sub-batches based on the maximum message size accepted by MongoDB.
For example, you could loop through the dictionary to build UpdateOne
write object, and construct an Unordered Bulk Write Operations that consists of 10000 updates at once.
requests = [
UpdateOne({'_id': 'abc'}, {'$set': {'Age': 70}}),
UpdateOne({'_id': 'bcd'}, {'$set': {'Age': 51}}),
]
try:
db.test.bulk_write(requests, ordered=False)
except BulkWriteError as bwe:
pprint(bwe.details)
Please note that unordered bulk write operations are batched and sent to the server in arbitrary order where they may be executed in parallel. Any errors that occur are reported after all operations are attempted.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.