简体   繁体   中英

Pymongo aggregate $in list

I'm trying to get some specific documents in my collection. I want documents that have a substring in one filed of my database (display_url) and that also look for some key words that must have in another field (edge_media_to_caption.edges.node.text). The first field is an url so I need to use wildcard, the only way that seems to work is using this signal: .*

However I'm having problems with the second part of my match where I use $in I think it is not working. This second field is a string field with text>

So I need to get documents that have a regex expression that i give (I tested this part alone and is working) and that also have at least one of the words ['. corona. ','. virus. ','. vírus. ','. covid. ','. pandemia. ','. pândemia. '] in the text.

        client = MongoClient('localhost', 27017)
        db = client.basededados
        collection = getattr(db, pdados) 
        pipeline= [{'$project': {"_id": True,
                          'legenda': '$edge_media_to_caption.edges.node.text',
                          'data': '$taken_at_timestamp',
                          'hash': '$tags',
                          'id' :'$display_url'}},
            {'$match': {'$and': [{"id": {"$regex": '/%s/' % nitem[0]}},
                                 {"legenda": {"$in": ['.*corona.*','.*virus.*','.*vírus.*','.*covid.*','.*pandemia.*','.*pândemia.*']}}
                                ]}}
                    ]

To wildcard match a string, use a regex . In pure Mongo:

{$in: [/\.corona\./, ...]}

In pymongo, you can use native Python regexen:

import re

...

{'$in': [re.compile(r'\.corona\.'), ...]}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM