简体   繁体   中英

Trouble understanding mongo aggregation

I am trying to list all the virtual machines (vms) in my mongo database that use a certain data store, EMC_123. I have this script, but it list vms that do not use the data store EMC_123.

#!/usr/bin/env python
import pprint
import pymongo
def run_query():
    server = '127.0.0.1'
    client = pymongo.MongoClient("mongodb://%s:27017/" % server)
    db = client["data_center_test"]
    collection = db["data_centers"]
    pipeline = [
        { "$match": { "clusters.hosts.vms.data_stores.name" : "EMC_123"}},
        { "$group": { "_id" : "$clusters.hosts.vms.name" }}
    ]
    for doc in list(db.data_centers.aggregate(pipeline)):
        pp = pprint.PrettyPrinter()
        pp.pprint(doc)
    pp.pprint (db.command('aggregate', 'data_centers', pipeline=pipeline, explain=True))


def main():
    run_query()
    return 0

# Start program
if __name__ == "__main__":
    main()

I assume I there is something wrong with my pipeline. Here is the plan that gets printed out:

{u'ok': 1.0,
 u'stages': [{u'$cursor': {u'fields': {u'_id': 0,
                                   u'clusters.hosts.vms.name': 1},
                       u'query': {u'clusters.hosts.vms.data_stores.name': u'EMC_123'},
                       u'queryPlanner': {u'indexFilterSet': False,
                                         u'namespace': u'data_center_test.data_centers',
                                         u'parsedQuery': {u'clusters.hosts.vms.data_stores.name': {u'$eq': u'EMC_123'}},
                                         u'plannerVersion': 1,
                                         u'rejectedPlans': [],
                                         u'winningPlan': {u'direction': u'forward',
                                                          u'filter': {u'clusters.hosts.vms.data_stores.name': {u'$eq': u'EMC_123'}},
                                                          u'stage': u'COLLSCAN'}}}},
         {u'$group': {u'_id': u'$clusters.hosts.vms.name'}}]}

UPDATE:

Here is a skeleton of what the document looks like:

{
   "name" : "data_center_name",
   "clusters" : [
      {
         "hosts" : [
            {
               "name" : "esxi-hostname",
               "vms" : [
                  {
                     "data_stores" : [ { "name" : "EMC_123" } ],
                     "name" : "vm-name1",
                     "networks" : [ { "name" : "vlan334" } ]
                  },
                  {
                     "data_stores" : [ { "name" : "some_other_data_store" } ],
                     "name" : "vm-name2",
                     "networks" : [ { "name" : "vlan334" } ]
                  }
               ]
            }
         ],
         "name" : "cluster_name"
      }
   ]
}

The problem I am seeing is that vm-name2 shows up in the results when it doesn't have EMC_123 as a data store.

Upate 2:

ok I am able to write a mongo shell query that does what I want. It is a little ugly:

db.data_centers.aggregate({$unwind: '$clusters'}, {$unwind: '$clusters.hosts'}, {$unwind: '$clusters.hosts.vms'}, {$unwind: '$clusters.hosts.vms.data_stores'}, {$match: {"clusters.hosts.vms.data_stores.name": "EMC_123"}})

I came about this in the second answer of this SO question: MongoDB Projection of Nested Arrays

Based on the answers in MongoDB Projection of Nested Arrays I had to change my pipeline to this:

pipeline = [
         {'$unwind': '$clusters'},
         {'$unwind': '$clusters.hosts'},
         {'$unwind': '$clusters.hosts.vms'},
         {'$unwind': '$clusters.hosts.vms.data_stores'},
         {'$match': {"clusters.hosts.vms.data_stores.name": "EMC_123"}}
]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM