[英]How to use find() nested documents for two levels or more?
Here is my sample mongodb database 这是我的示例mongodb数据库
database image for one object 一个对象的数据库映像
The above is a database with an array of articles. 上面是一个包含一系列文章的数据库。 I fetched only one object for simplicity purposes. 为了简单起见,我只提取了一个对象。
database image for multiple objects ( max 20 as it's the size limit ) 多个对象的数据库映像(最大为20,因为它是大小限制)
I have about 18k such entries. 我有大约18k这样的条目。 I have to extract the description and title tags present inside the (articles and 0) subsections. 我必须提取(文章和0)小节中存在的描述和标题标签。 The find() method is the question here.. i have tried this : find()方法是这里的问题。
for i in db.ncollec.find({'status':"ok"}, { 'articles.0.title' : 1 , 'articles.0.description' : 1}):
for j in i:
save.write(j)
After executing the code, the file save has this : 执行代码后,文件保存具有以下内容:
_id _ID
articles 用品
_id _ID
articles 用品
and it goes on and on.. 并持续不断..
Any help on how to print what i stated above? 任何有关如何打印我上面所说的帮助?
My entire code for reference : 我的整个代码供参考:
import json
import newsapi
from newsapi import NewsApiClient
import pymongo
from pymongo import MongoClient
client = MongoClient()
db = client.dbasenews
ncollec = db.ncollec
newsapi = NewsApiClient(api_key='**********')
source = open('TextsExtractedTemp.txt', 'r')
destination = open('NewsExtracteddict.txt', "w")
for word in source:
if word == '\n':
continue
all_articles = newsapi.get_everything(q=word, language='en', page_size=1)
print(all_articles)
json.dump(all_articles, destination)
destination.write("\n")
try:
ncollec.insert(all_articles)
except:
pass
Okay, so I checked a little to update my rusty memory of pymongo, and here is what I found. 好的,所以我做了一些检查以更新pymongo的生锈记忆,这就是我所发现的。
The correct query should be : 正确的查询应为:
db.ncollec.find({ 'status':"ok",
'articles.title' : { '$exists' : 'True' },
'articles.description' : { '$exists' : 'True' } })
Now, if you do this : 现在,如果您这样做:
query = { 'status' : "ok",
'articles.title' : { '$exists' : 'True' },
'articles.description' : { '$exists' : 'True' } }
for item in db.ncollect.find(query):
print item
And that it doesn't show anything, the query is correct, but you don't have the right database, or the right tree, or whatever. 而且它什么也没有显示,查询是正确的,但是您没有正确的数据库,正确的树或其他任何东西。
But I assure you, that with the database you showed me, that if you do... 但是我向您保证,如果您向我展示了数据库,那么如果...
query = { 'status' : "ok",
'articles.title' : { '$exists' : 'True' },
'articles.description' : { '$exists' : 'True' } }
for item in db.ncollect.find(query):
save.write(item[0]['title'])
save.write(item[0]['description'])
It'll do what you wished to do in the first place. 首先,它会做您希望做的事情。
Now, the key item[0]
might not be good, but for this, I can't really be of any help since it is was you are showing on the screen. 现在,关键item[0]
可能不是很好,但是为此,我真的没有任何帮助,因为它是您在屏幕上显示的。 :) :)
Okay, now. 好吧,现在。 I have found something for you that is a bit more complicated, but is cool :) But I'm not sure if it'll work for you. 我为您找到了一些比较复杂的东西,但是很酷:)但我不确定它是否对您有用。 I suspect you're giving us a wrong tree, since when you do .find( {'status' : 'ok'} )
, it doesn't return anything, and it should return all the documents with a 'status' : 'ok'
, and since you have lots... 我怀疑您给我们提供了错误的树,因为当您执行.find( {'status' : 'ok'} )
,它不会返回任何内容,它应该返回所有带有'status' : 'ok'
的文档'status' : 'ok'
,既然你有很多...
Anyways, here is the query, that you should use with .aggregate()
method, instead of .find()
: 无论如何,这是您应该与.aggregate()
方法(而不是.find()
一起使用的查询:
elem = { '$match' : { 'status' : 'ok', 'articles.title' : { '$exists' : 'True'}, 'articles.description' : { '$exists' : 'True'}} }
[ elem, { '$unwind' : '$articles' }, elem ]
If you want an explanation as to how this works, I invite you to read this page . 如果您想对此进行解释,请您阅读此页面 。
This query will return ONLY the elements in your array that have a title, and a description, with a status OK. 该查询将仅返回数组中具有标题和描述且状态为OK的元素。 If an element doesn't have a title, or a description, it will be ignored. 如果一个元素没有标题或描述,它将被忽略。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.