简体   繁体   English

插入具有Python唯一键的MongoDB集合

[英]Insert to MongoDB collection that has unique key with Python

I have a collection called englishWords, and the unique index is the "word" field. 我有一个名为englishWords的集合,唯一索引是“word”字段。 When I do this 当我这样做

from pymongo import MongoClient

tasovshik = MongoClient()
db = tasovshik.tongler
coll = db.englishWords

f = open('book.txt')
for word in f.read().split():
    coll.insert( { "word": word } } )

I get this error message 我收到此错误消息

pymongo.errors.DuplicateKeyError: E11000 duplicate key error index: tongler.englishWords.$word_1 dup key: { : "Harry" }
, but it stops to insert when the first existing word is to be inserted. ,但在插入第一个现有单词时停止插入。

I do not want to implement the check of existence, I want to use the benefits of unique index with no problems. 我不想实现检查存在,我想使用唯一索引的好处没有问题。

To avoid unnecessary exception handling, you could do an upsert: 为避免不必要的异常处理,您可以执行upsert:

from pymongo import MongoClient

tasovshik = MongoClient()
db = tasovshik.tongler
coll = db.englishWords

for word in f.read().split():
    coll.replace_one({'word': word}, {'word': word}, True)

The last argument specifies that MongoDB should insert the value if it does not already exist. 最后一个参数指定MongoDB应该插入值,如果它尚不存在。

Here's the documentation . 这是文档


EDIT : For even faster performances for a long list of words, you could do it in bulk like this: 编辑 :对于一长串单词更快的表现,你可以这样批量做:

from pymongo import MongoClient

tasovshik = MongoClient()
db = tasovshik.tongler
coll = db.englishWords

bulkop = coll.initialize_unordered_bulk_op()
for word in f.read().split():
    bulkop.find({'word':word}).upsert()

bulkop.execute()

Taken from bulk operations documentation 取自批量操作文档

You could do the following: 您可以执行以下操作:

for word in f.read().split():
    try:
        coll.insert( { "word": word } } )
    except pymongo.errors.DuplicateKeyError:
        continue

This will ignore errors. 这将忽略错误。

And also, did you drop the collection before trying? 而且,你在尝试之前放弃了这个系列吗?

I've just run your code and everything looks good except that you have an extra } at the last line. 我只是运行你的代码,除了你在最后一行有一个额外的}之外,一切看起来都很好。 Delete that, and you don't have the drop any collection. 删除它,你没有删除任何集合。 Every insert , creates it's own batch of data, so there is no need for dropping the previous collection. 每个insert都会创建自己的一批数据,因此不需要删除以前的集合。

Well, error msg indicates that the key Harry is already inserted and you are trying to insert again with the same key. 好吧,错误消息msg表示已插入密钥Harry并且您正尝试使用相同的密钥重新插入。 Looks like this in not your entire code? 看起来这不是你的整个代码吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM