简体   繁体   English

如何使用PyMongo迭代和更新文档?

[英]How to iterate and update documents with PyMongo?

I have a simple, single-client setup for MongoDB and PyMongo 2.6.3. 我有一个简单的单客户端设置MongoDB和PyMongo 2.6.3。 The goal is to iterate over each document in the collection collection and update ( save ) each document in the process. 目标是迭代集合collection中的每个文档,并更新( save )流程中的每个文档。 The approach I'm using looks roughly like: 我正在使用的方法大致如下:

cursor = collection.find({})
index = 0
count = cursor.count()
while index != count:
    doc = cursor[index]
    print 'updating doc ' + doc['name']
    # modify doc ..
    collection.save(doc)
    index += 1
cursor.close()

The problem is that save is apparently modifying the order of documents in the cursor. 问题是save显然正在修改游标中文档的顺序。 For example, if my collection is made of 3 documents ( id s omitted for clarity): 例如,如果我的集合由3个文档组成(为清楚起见,省略了id ):

{
    "name": "one"
}
{
    "name": "two"
}
{
    "name": "three"
}

the above program outputs: 上述计划产出:

> updating doc one
> updating doc two
> updating doc two

If however, the line collection.save(doc) is removed, the output becomes: 但是,如果删除了行collection.save(doc) ,则输出变为:

> updating doc one
> updating doc two
> updating doc three

Why is this happening? 为什么会这样? What is the right way to safely iterate and update documents in a collection? 安全地迭代更新集合中的文档的正确方法是什么?

Found the answer in MongoDB documentation : 在MongoDB 文档中找到答案:

Because the cursor is not isolated during its lifetime, intervening write operations on a document may result in a cursor that returns a document more than once if that document has changed. 由于游标在其生命周期内未被隔离,因此对文档进行干预写入操作可能会导致光标在文档发生更改时多次返回文档。 To handle this situation, see the information on snapshot mode . 要处理此情况,请参阅有关快照模式的信息。

Snapshot mode is enabled on the cursor, and makes a nice guarantee: 光标上启用了快照模式,这是一个很好的保证:

snapshot() traverses the index on the _id field and guarantees that the query will return each document (with respect to the value of the _id field) no more than once. snapshot()遍历_id字段上的索引,并保证查询将返回每个文档(相对于_id字段的值)不超过一次。

To enable snapshot mode with PyMongo: 要使用PyMongo启用快照模式:

cursor = collection.find(spec={},snapshot=True)

as per PyMongo find() documentation . 根据PyMongo find() 文档 Confirmed that this fixed my problem. 确认这解决了我的问题。

Snapshot does the work. 快照完成工作。

But on pymongo 2.9 and onwards, the syntax is slightly different. 但是在pymongo 2.9和之后,语法略有不同。

cursor = collection.find(modifiers={"$snapshot": True})

or for any version, 或任何版本,

cursor = collection.find({"$snapshot": True})

as per the PyMongo documentations 根据PyMongo文档

I couldn't recreate your situation but maybe, off the top of my head, because fetching the results like you're doing it get's them one by one from the db, you're actually creating more as you go (saving and then fetching the next one). 我无法重新创建你的情况,但也许,在我的头脑中,因为获取结果就像你正在做的那样从db中逐一获取它们,你实际上是在创建更多的东西(保存然后获取)下一个)。

You can try holding the result in a list (that way, your fetching all results at once - might be heavy , depending on your query): 您可以尝试将结果保存在列表中(这样,您一次获取所有结果 - 可能很重 ,具体取决于您的查询):

cursor = collection.find({})
# index = 0
results = [res for res in cursor] #count = cursor.count()
cursor.close()
for res in results: # while index != count //This will iterate the list without you needed to keep a counter:
    # doc = cursor[index] // No need for this since 'res' holds the current record in the loop cycle
    print 'updating doc ' + res['name'] # print 'updating doc ' + doc['name']
    # modify doc ..
    collection.save(res)
    # index += 1 // Again, no need for counter

Hope it helps 希望能帮助到你

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM