简体   繁体   中英

Quickly add field to large MongoDB collection by _id

I have a MongoDB collection like:

[{'_id': abc, 'Sex': 'f'}, {'_id': bcd, 'Sex': 'm'}, {'_id': cde, 'Sex': 'm'}, {'_id': def, 'Sex': 'm'}]

I also have a Python list of dictionaries like:

[{'_id': abc, 'Age': 70}, {'_id': bcd, 'Age': 51}, {'_id': cde}, {'_id': def, 'Age': 'unknown'}]

I need to match by _id on a large collection, updating each document, eg, as below:

[{'_id': abc, 'Sex': 'f', 'Age': 70}, {'_id': bcd, 'Sex': 'm', 'Age': 51}, {'_id': cde, 'Sex': 'm'}, {'_id': def, 'Sex': 'm', 'Age': 'unknown'}]

Is there a way to do this efficiently for a large collection? (Not just iterating through the list of dictionaries and using update_one on each document.)

Is there a way to do this efficiently for a large collection?

You can perform Bulk Write Operations instead of sending a single update operation per document.

If you are using PyMongo then it will automatically split the batch update operations into smaller sub-batches based on the maximum message size accepted by MongoDB.

For example, you could loop through the dictionary to build UpdateOne write object, and construct an Unordered Bulk Write Operations that consists of 10000 updates at once.

 requests = [
     UpdateOne({'_id': 'abc'}, {'$set': {'Age': 70}}),
     UpdateOne({'_id': 'bcd'}, {'$set': {'Age': 51}}),
 ]
 try:
     db.test.bulk_write(requests, ordered=False)
 except BulkWriteError as bwe:
     pprint(bwe.details)

Please note that unordered bulk write operations are batched and sent to the server in arbitrary order where they may be executed in parallel. Any errors that occur are reported after all operations are attempted.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM