简体   繁体   English

MongoDB:如何在其他集合中也存在的集合中插入文档?

[英]MongoDB: How to insert document in collection that exists in other collection as well?

I have two collections EN_PR2019 and EN_PR2018 .我有两个 collections EN_PR2019EN_PR2018 They mosty contain the same things but from different years.它们大多数包含相同的东西,但来自不同的年份。 After inserting all the documents into EN_PR2019 I'm trying to insert documents that may have the same _id as in collection EN_PR2019 .将所有文档插入EN_PR2019后,我尝试插入可能与集合EN_PR2019具有相同_id的文档。 I read that I needed to create a index for the collection to be able to have records with the same _id in two different collections.我读到我需要为集合创建一个索引,以便能够在两个不同的 collections 中拥有具有相同_id的记录。 Right now I'm getting pymongo.errors.DuplicateKeyError: E11000 duplicate key error collection: Database.EN_PR2018 index: id_1 dup key: { id: null } .现在我得到pymongo.errors.DuplicateKeyError: E11000 duplicate key error collection: Database.EN_PR2018 index: id_1 dup key: { id: null }

How do I insert the same record, having the same _id in two different collections without raising errors or having to deal with duplicates?如何插入相同的记录,在两个不同的 collections 中具有相同的_id而不会引发错误或不必处理重复项?

def check_record(collection, record_id):
    """Check if record exists in collection
        Args:
            record_id (str): record _id as in collection
    """
    return collection.find_one({'id': record_id})

def collection_index(collection, index):
    """Checks if index exists for collection, 
    and return a new index if not

        Args:
            collection (str): Name of collection in database
            index (str): Dict key to be used as an index
    """
    if index not in collection.index_information():
        return collection.create_index([(index, pymongo.ASCENDING)], unique=True)

def push_upstream(collection, record_id, record):
    """Update record in collection
        Args:
            collection (str): Name of collection in database
            record_id (str): record _id to be put for record in collection
            record (dict): Data to be pushed in collection
    """
    return collection.insert_one({"_id": record_id}, {"$set": record})

def update_upstream(collection, record_id, record):
    """Update record in collection
        Args:
            collection (str): Name of collection in database
            record_id (str): record _id as in collection
            record (dict): Data to be updated in collection
    """
    return collection.update_one({"_id": record_id}, {"$set": record}, upsert=True)

def executePushPlayer(db):

    playerstats = load_file(db.playerfile)
    collection = db.DATABASE[db.league + db.season]
    collection_index(collection, 'id')
    for player in playerstats:
        existingPost = check_record(collection, player['id'])
        if existingPost:
            update_upstream(collection, player['id'], player)
        else:
            push_upstream(collection, player['id'], player)

if __name__ == '__main__':
    test = DB('EN_PR', '2018')
    executePushPlayer(test)

The _id field in every document inserted into a MongoDB database is special because the _id field always indexed and the index is a unique index .插入 MongoDB 数据库的每个文档中的_id字段是特殊的,因为_id字段总是被索引并且索引是唯一索引 It is perfectly reasonable to use the _id fields from one collection in another as long the uniqueness constraint is not breached in the new collection.只要在新集合中没有违反唯一性约束,在另一个集合中使用一个集合中的_id字段是完全合理的。

From the error I would guess that several of your player["_id"] value are null.从错误中我猜你的几个player["_id"]值是 null。 That points to some problems in your load_file project.这表明您的load_file项目中存在一些问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM