简体   繁体   English

如何使用 pymongo 获取仅包含 ObjectId 的列表?

[英]How do I get a list of just the ObjectId's using pymongo?

I have the following code:我有以下代码:

client = MongoClient()
data_base = client.hkpr_restore
agents_collection = data_base.agents
agent_ids = agents_collection.find({},{"_id":1})

This gives me a result of:这给了我一个结果:

{u'_id': ObjectId('553020a8bf2e4e7a438b46d9')}
{u'_id': ObjectId('553020a8bf2e4e7a438b46da')}
{u'_id': ObjectId('553020a8bf2e4e7a438b46db')}

How do I just get at the ObjectId's so I can then use each ID to search another collection?我如何才能获得 ObjectId,然后才能使用每个 ID 搜索另一个集合?

Use distinct 使用distinct

In [27]: agent_ids = agents_collection.find().distinct('_id')

In [28]: agent_ids
Out[28]: 
[ObjectId('553662940acf450bef638e6d'),
 ObjectId('553662940acf450bef638e6e'),
 ObjectId('553662940acf450bef638e6f')]

In [29]: agent_id2 = [str(id) for id in agents_collection.find().distinct('_id')]

In [30]: agent_id2
Out[30]: 
['553662940acf450bef638e6d',
 '553662940acf450bef638e6e',
 '553662940acf450bef638e6f']

Try creating a list comprehension with just the _ids as follows: 尝试仅使用_ids创建列表_ids ,如下所示:

>>> client = MongoClient()
>>> data_base = client.hkpr_restore
>>> agents_collection = data_base.agents
>>> result = agents_collection.find({},{"_id":1})
>>> agent_ids = [x["_id"] for x in result]
>>> 
>>> print agent_ids
[ ObjectId('553020a8bf2e4e7a438b46d9'),  ObjectId('553020a8bf2e4e7a438b46da'),  ObjectId('553020a8bf2e4e7a438b46db')]
>>>

I would like to add something which is more general than querying for all _id. 我想添加一些比查询所有_id更通用的内容。

import bson
[...]
results = agents_collection.find({}})
objects = [v for result in results for k,v in result.items()
          if isinstance(v,bson.objectid.ObjectId)]

Context: saving objects in gridfs creates ObjectIds, to retrieve all of them for further querying, this function helped me out. 上下文:将对象保存在gridfs中会创建ObjectId,以检索所有对象以进行进一步查询,此功能帮助了我。

I solved the problem by following this answer . 我按照这个答案解决了问题。 Adding hint to the find syntax then simply iterate through the cursor returned. 向提示语法添加提示,然后简单地遍历返回的游标。

db.c.find({},{_id:1}).hint(_id:1);

I am guessing without the hint the cursor would get the whole documentation back when iterated, causing the iteration to be extremely slow. 我猜想如果没有提示,光标将在迭代时重新获得整个文档,从而导致迭代非常慢。 With hint, the cursor would only return ObjectId back and the iteration would finish very quickly. 有了提示,光标将只返回ObjectId,并且迭代将很快完成。

The background is I am working on an ETL job that require sync one mongo collection to another while modify the data by some criteria. 背景是我正在从事ETL作业,该作业需要将一个mongo集合同步到另一个mongo集合,同时按某些条件修改数据。 The total number of Object id is around 100000000. 对象ID的总数约为100000000。

I tried using distinct but got the following error: 我尝试使用distinct,但出现以下错误:

Error in : distinct too big, 16mb cap

I tried using aggregation and did $group as answered from other similar question. 我尝试使用聚合,并做了$ group作为其他类似问题的答案。 Only to hit some memory consumption error. 只能打一些内存消耗错误。

Although I wasn't searching for the _id , I was extracting another field.虽然我没有搜索_id ,但我正在提取另一个字段。 I found this method was fast (assuming you have an index on the field):我发现这种方法很快(假设您在该字段上有一个索引):

list_of_strings = {x.get("MY_FIELD") for x in db.col.find({},{"_id": 0, "MY_FIELD": 1}).hint("MY_FIELDIdx")}

Where MY_FIELDIdx is the name of the index for the field I'm trying to extract.其中 MY_FIELDIdx 是我要提取的字段的索引名称。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM