简体   繁体   中英

Pymongo aggregate: filter by count of fields number (dynamic)

Let's say I have an aggregation pipeline that for now leads to a collection with documents built like this:

{'name': 'Paul',
 'football_position': 'Keeper',
 'basketball_position': 4,...}

Obviously not everyone plays every sport so for some documents there would be fields that do not exist. The document regarding them would then be

{'name': 'Louis'}

What I want to do is to filter people that play at least one sport, inside my aggregation pipeline

I know that this is easy to check for one field with {'$match': {'football_position': {'$exists': True}}} , but I want to check if any of these fields exist.

I found an old question a bit similar ( Check for existence of multiple fields in MongoDB document ) but it checks for the existence of all fields -which, while bothersome, could be attained by the multiplication of multiples $match operations. Plus, maybe mongoDB has now a better way to handle this than writing a custom JavaScript function.

maybe mongoDB has now a better way to handle this

Yes, you can now utilise an aggregation operator $objectToArray ( SERVER-23310 ) to turn keys into values. It should be able to count 'dynamic' number of fields. Combining this operator with $addFields could be quite useful.

Both operators are available in MongoDB v3.4.4+ Using your documents above as example:

db.sports.aggregate([
          { $addFields : 
             { "numFields" : 
               { $size:
                 { $objectToArray:"$$ROOT"}
               }
             }
          }, 
          { $match: 
            { numFields: 
              {$gt:2}
            }
          }
])

The aggregation pipeline above, will first add a field called numFields . The value would be the size of an array. The array would contain the number of fields in the document. The second stage would filter only for 2 fields and greater (two fields because there's still _id field plus name ).

In PyMongo , the above aggregation pipeline would look like:

cursor = collection.aggregate([
                         {"$addFields":{"numFields":
                                         {"$size":{"$objectToArray":"$$ROOT"}}}}, 
                         {"$match":{"numFields":{"$gt":2}}}
         ])

Having said the above, if possible for your use case, I would suggest to reconsider your data models for easier access. ie Add a new field to keep track of number of sports when a new sport position is inserted/added.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM