简体   繁体   English

迭代游标时修改Mongo对象的安全+有效方法?

[英]Safe+efficient way to modify Mongo objects while iterating over a cursor?

I've got some code that examines every object in a Mongo collection (iterating over the result of a find() with no parameters), and makes changes to some of them. 我有一些代码检查Mongo集合中的每个对象(迭代不带参数的find()的结果),并对其中一些进行更改。 It seems that this isn't a safe thing to do: my changes are saved, but then when I continue iterating through the cursor, a subset of the changed objects (10-15%) show up a second time. 看起来这不安全:我的更改被保存,但是当我继续迭代光标时,更改对象的子集(10-15%)会再次出现。 I wasn't changing the document ID or anything that there's an index on. 我没有更改文档ID或任何有索引的内容。

I figure I could avoid this problem by grabbing all the document IDs ahead of time (convert the cursor to an array), but these are large collections so I'd really like to avoid that. 我想通过提前抓取所有文档ID(将光标转换为数组)来避免这个问题,但这些是大型集合,所以我真的想避免这种情况。

I noticed that the result of find() by default doesn't seem to have any defined order, so I tried putting an explicit sort on the cursor, {"_id":1}. 我注意到默认情况下find()的结果似乎没有任何已定义的顺序,所以我尝试对游标进行显式排序,{“_ id”:1}。 This seems to have fixed the problem-- now nothing shows up twice no matter what I modify. 这似乎解决了这个问题 - 无论我修改什么,现在都没有出现过两次。 But I don't know if that's a good/reliable approach. 但我不知道这是不是一个好的/可靠的方法。 As far as I can tell from the documentation, adding a sort does not make it pre-query all the IDs; 据我从文档中可以看出,添加排序并不会使其预先查询所有ID; if so, that's nice, but then I don't know why it would fix the problem. 如果是这样,那很好,但后来我不知道为什么它会解决这个问题。

Is it just a bad idea to use cursors while changing stuff? 改变东西时使用游标是一个坏主意吗?

I'm using Scala/Casbah, if that matters. 我正在使用Scala / Casbah,如果这很重要的话。

It sounds like what you want is a snapshot query. 听起来你想要的是快照查询。 Here's more info on how to do that: 以下是有关如何执行此操作的更多信息:

http://www.mongodb.org/display/DOCS/How+to+do+Snapshotted+Queries+in+the+Mongo+Database http://www.mongodb.org/display/DOCS/How+to+do+Snapshotted+Queries+in+the+Mongo+Database

Consider using an update command that modifies multiple documents: http://docs.mongodb.org/manual/tutorial/modify-documents/ 考虑使用修改多个文档的update命令: http//docs.mongodb.org/manual/tutorial/modify-documents/

Also, since you are only modifying some objects, consider using a query that only returns documents that you are actually going to modify rather than scanning the entire collection. 此外,由于您只修改了某些对象,因此请考虑使用仅返回实际要修改的文档而不是扫描整个集合的查询。

Iterating over the result of a find and modifying objects may seem more convenient and flexible, as you are not limited to what you can do with update operators, and you can write code in your language of choice to modify the document. 迭代find和修改对象的结果可能看起来更方便和灵活,因为您不仅限于使用更新操作符可以执行的操作,并且您可以使用您选择的语言编写代码来修改文档。 However, there is the problem you described as well as other limitations: 但是,您描述的问题以及其他限制:

http://docs.mongodb.org/manual/faq/developers/#faq-developers-isolate-cursors http://docs.mongodb.org/manual/faq/developers/#faq-developers-isolate-cursors

For example, snapshot queries are not 100% safe, and they cannot be used with sharded collection, so if you later decide to shard, then your solution will break. 例如,快照查询不是100%安全的,并且它们不能与分片集合一起使用,因此如果您以后决定分片,那么您的解决方案将会中断。

If you need to modify a very large number of objects in a more complicated way, maybe map-reduce or the aggregation pipeline can be a way to solve your problem: 如果您需要以更复杂的方式修改大量对象,可以使用map-reduce或聚合管道来解决您的问题:

http://docs.mongodb.org/manual/core/aggregation-pipeline/ http://docs.mongodb.org/manual/core/aggregation-pipeline/

http://docs.mongodb.org/manual/core/map-reduce/ http://docs.mongodb.org/manual/core/map-reduce/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM