简体   繁体   English

如何将find()嵌套文档用于两个或更多个级别?

[英]How to use find() nested documents for two levels or more?

Here is my sample mongodb database 这是我的示例mongodb数据库

database image for one object 一个对象的数据库映像

The above is a database with an array of articles. 上面是一个包含一系列文章的数据库。 I fetched only one object for simplicity purposes. 为了简单起见,我只提取了一个对象。

database image for multiple objects ( max 20 as it's the size limit ) 多个对象的数据库映像(最大为20,因为它是大小限制)

I have about 18k such entries. 我有大约18k这样的条目。 I have to extract the description and title tags present inside the (articles and 0) subsections. 我必须提取(文章和0)小节中存在的描述标题标签。 The find() method is the question here.. i have tried this : find()方法是这里的问题。

for i in db.ncollec.find({'status':"ok"}, { 'articles.0.title' : 1 , 'articles.0.description' : 1}):
    for j in i:
        save.write(j)

After executing the code, the file save has this : 执行代码后,文件保存具有以下内容:

_id _ID
articles 用品
_id _ID
articles 用品

and it goes on and on.. 并持续不断..

Any help on how to print what i stated above? 任何有关如何打印我上面所说的帮助?

My entire code for reference : 我的整个代码供参考:

    import json
    import newsapi
    from newsapi import NewsApiClient
    import pymongo
    from pymongo import MongoClient

    client = MongoClient()
    db = client.dbasenews
    ncollec = db.ncollec


    newsapi = NewsApiClient(api_key='**********')
    source = open('TextsExtractedTemp.txt', 'r')
    destination = open('NewsExtracteddict.txt', "w")
    for word in source:
        if word == '\n':
            continue
        all_articles = newsapi.get_everything(q=word, language='en', page_size=1)
        print(all_articles)
        json.dump(all_articles, destination)
        destination.write("\n")
        try:
            ncollec.insert(all_articles)
        except:
            pass

Okay, so I checked a little to update my rusty memory of pymongo, and here is what I found. 好的,所以我做了一些检查以更新pymongo的生锈记忆,这就是我所发现的。

The correct query should be : 正确的查询应为:

db.ncollec.find({ 'status':"ok", 
                  'articles.title' : { '$exists' : 'True' },
                  'articles.description' : { '$exists' : 'True' } })

Now, if you do this : 现在,如果您这样做:

query = { 'status' : "ok",
          'articles.title' : { '$exists' : 'True' },
          'articles.description' : { '$exists' : 'True' } }
for item in db.ncollect.find(query):
    print item

And that it doesn't show anything, the query is correct, but you don't have the right database, or the right tree, or whatever. 而且它什么也没有显示,查询是正确的,但是您没有正确的数据库,正确的树或其他任何东西。

But I assure you, that with the database you showed me, that if you do... 但是我向您保证,如果您向我展示了数据库,那么如果...

query = { 'status' : "ok",
          'articles.title' : { '$exists' : 'True' },
          'articles.description' : { '$exists' : 'True' } }
for item in db.ncollect.find(query):
    save.write(item[0]['title'])
    save.write(item[0]['description'])

It'll do what you wished to do in the first place. 首先,它会做您希望做的事情。

Now, the key item[0] might not be good, but for this, I can't really be of any help since it is was you are showing on the screen. 现在,关键item[0]可能不是很好,但是为此,我真的没有任何帮助,因为它是您在屏幕上显示的。 :) :)


Okay, now. 好吧,现在。 I have found something for you that is a bit more complicated, but is cool :) But I'm not sure if it'll work for you. 我为您找到了一些比较复杂的东西,但是很酷:)但我不确定它是否对您有用。 I suspect you're giving us a wrong tree, since when you do .find( {'status' : 'ok'} ) , it doesn't return anything, and it should return all the documents with a 'status' : 'ok' , and since you have lots... 我怀疑您给我们提供了错误的树,因为当您执行.find( {'status' : 'ok'} ) ,它不会返回任何内容,它应该返回所有带有'status' : 'ok'的文档'status' : 'ok' ,既然你有很多...

Anyways, here is the query, that you should use with .aggregate() method, instead of .find() : 无论如何,这是您应该与.aggregate()方法(而不是.find()一起使用的查询:

elem = { '$match' : { 'status' : 'ok', 'articles.title' : { '$exists' : 'True'}, 'articles.description' : { '$exists' : 'True'}} }
[ elem, { '$unwind' : '$articles' }, elem ]

If you want an explanation as to how this works, I invite you to read this page . 如果您想对此进行解释,请您阅读此页面

This query will return ONLY the elements in your array that have a title, and a description, with a status OK. 该查询将仅返回数组中具有标题和描述且状态为OK的元素。 If an element doesn't have a title, or a description, it will be ignored. 如果一个元素没有标题或描述,它将被忽略。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何更有效地找到两条线的交点? (在 Python 中嵌套 for 循环) - How can I find the intersection of two lines more efficiently? (nested for loops in Python) 您将如何使用递归来查找两个以上参数的结果 - How would you use recursion to find the result of more than two arguments 使用“路径”访问不同级别的嵌套字典 - Use 'paths' to access different levels of nested dictionaries 我们如何使用人工神经网络查找相似的文档? - How can we use artificial neural networks to find similar documents? 熊猫MultiIndex(超过2个级别)DataFrame到嵌套Dict / JSON - Pandas MultiIndex (more than 2 levels) DataFrame to Nested Dict/JSON 将命名空间与嵌套字典一起使用,其中包含两个级别和第二级的两个键 - Using namespace with nested dictionaries with two levels and two keys at second level 如何在python中将两个或多个定界符与split()一起使用 - how to use two or more delimiters with split() in python 两个xpath以上如何用if来处理? - How to use “if” to handle with two xpath or more? 如何在水平曲线图中添加更多水平? - How to add more levels to a level curve graph? 使用sklearn查找具有大量文档的两个文本之间的字符串相似度 - Use sklearn to find string similarity between two texts with large group of documents
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM