简体   繁体   中英

update multiple record from dataframe into mongoDB with upsert=True

I have a dataframe of almost 120000 records as follows. Also I have a mongoDB collection which looks exacly same as below dataframe

ItemID ParentID ItemRating ItemPrice Qty
A1     ItemA1   0          12        100
A2     ItemA2   0          15        200
B1     ItemB1   0          20        300
B2     ItemB2   0          25        400
B3     ItemB3   0          30        150  

Now, I want update and Insert record from my dataframe into mongo collection with following condition

  1. If the combination of ItemID and ParentID both are present in the collection then update the remaining columns into mongo collection from dataframe
  2. if the combination of ItemID and ParentID is not present then insert new record. Here ItemID and ParentID is more like unique key to check the update condition.

I know this can be done with PyMongo update_many method by setting upsert=true. but I am not sure how can I do that ? how should I write my filter condition ?

Regards Vipul

You won't be able to use update_many() as that takes a single filter criteria which in your case won't work. What you need is replace_one() in a loop with upsert=true. Something like:

from pymongo import MongoClient
import pandas as pd
db = MongoClient('localhost', 27019)['testdatabase1']
df = pd.DataFrame({'ItemID':['A1','A2','B1','B2','B3'],
                   'ParentID':['ItemA1','ItemA2','ItemB1','ItemB2','ItemB3'],
                   'ItemRating ': [0,0,0,0,0],
                   'ItemPrice ': [12,15,20,25,30],
                   'Qty': [100,200,300,400,150]
                   })

for row in df.iterrows():
    record = row[1].to_dict()
    result = db.testcollection.replace_one({'ItemId': record.get('ItemId'), 'ParentID': record.get('ParentID')}, record, upsert=True)
    print(f'{"Replaced: " if result.modified_count == 1 else ""}{"Inserted: " if result.upserted_id is not None  else ""} {record}')

Since you are dealing with a lot of data, you probably want to condense it into a single database transaction using bulk_write() to execute a list of ReplaceOne operations that you have compiled with your dataframe criteria. See PyMongo: Bulk Write Operations .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM