简体   繁体   中英

Using unwind in multiple nested arrays in mongodb

I have stored objects in my mongodb(version 3.2) collection in the following schema,

{
    "_id" : ObjectId("585a42b5b7e79d1c0c533f1f"),
    "instanceId" : "i-b385a9bd",
    "DiskSpaceAvailable" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:14:00.000Z"),
                "Average" : 4.32112884521484,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:32:00.000Z"),
                "Average" : 4.32107543945312,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:50:00.000Z"),
                "Average" : 4.32101821899414,
                "Unit" : "Gigabytes"
            }
        ]
    },
    "DiskSpaceUsed" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:14:00.000Z"),
                "Average" : 3.33073806762695,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:32:00.000Z"),
                "Average" : 3.33079147338867,
                "Unit" : "Gigabytes"
            }
        ]
    },
    "MemoryUsed" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:14:00.000Z"),
                "Average" : 0.753532409667969,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:32:00.000Z"),
                "Average" : 0.753063201904297,
                "Unit" : "Gigabytes"
            }
        ]
    },
    "MemoryUtilization" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:18:00.000Z"),
                "Average" : 19.5049320125989,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:36:00.000Z"),
                "Average" : 19.5078950721357,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:54:00.000Z"),
                "Average" : 19.5068086169722,
                "Unit" : "Percent"
            }
        ]
    },
    "DiskSpaceUtilization" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:18:00.000Z"),
                "Average" : 42.9914921714092,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:36:00.000Z"),
                "Average" : 42.9921815029693,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:54:00.000Z"),
                "Average" : 42.992920072498,
                "Unit" : "Percent"
            }
        ]
    },
    "SwapUtilization" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:18:00.000Z"),
                "Average" : 0,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:36:00.000Z"),
                "Average" : 0,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:54:00.000Z"),
                "Average" : 0,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T13:12:00.000Z"),
                "Average" : 0,
                "Unit" : "Percent"
            }
        ]
    },
    "SwapUsed" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T13:06:00.000Z"),
                "Average" : 0,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T13:24:00.000Z"),
                "Average" : 0,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:36:00.000Z"),
                "Average" : 0,
                "Unit" : "Gigabytes"
            }
        ]
    },
    "MemoryAvailable" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:14:00.000Z"),
                "Average" : 3.10872268676758,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:32:00.000Z"),
                "Average" : 3.10919189453125,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:50:00.000Z"),
                "Average" : 3.10895538330078,
                "Unit" : "Gigabytes"
            }
        ]
    }
}

I am trying to use mongodb aggregate and following is my query

db.collectionSchema.aggregate([
    {
     $match :{ "instanceId" : "i-b385a9bd" }
    },
    {
      $unwind : "$DiskSpaceAvailable.Datapoints"   
    },
     {
      $unwind : "$DiskSpaceUtilization.Datapoints"   
    },
    {
      $unwind : "$DiskSpaceUsed.Datapoints"   
    },
    {
      $unwind : "$MemoryUsed.Datapoints"   
    },
    {
      $unwind : "$SwapUtilization.Datapoints"   
    },
    {
      $unwind : "$MemoryAvailable.Datapoints"   
    },
    {
      $unwind : "$MemoryUtilization.Datapoints"   
    },
    {
      $unwind : "$SwapUsed.Datapoints"   
    },
    {
      $group : { _id : "$instanceId" , 
               DiskSpaceAvailable : { "$avg" : "$DiskSpaceAvailable.Datapoints.Average" } , 
               DiskSpaceAvailableUnit : { "$addToSet" : "$DiskSpaceAvailable.Datapoints.Unit" },
               DiskSpaceUtilization : {"$avg" : "$DiskSpaceUtilization.Datapoints.Average"},
               DiskSpaceUtilizationUnit : {"$addToSet" : "$DiskSpaceUtilization.Datapoints.Unit"},
               DiskSpaceUsed : {"$avg" : "$DiskSpaceUsed.Datapoints.Average"},
               DiskSpaceUsedUnit : {"$addToSet" : "$DiskSpaceUsed.Datapoints.Unit"},
               MemoryUsed :{"$avg" : "$MemoryUsed.Datapoints.Average"},
               MemoryUsedUnit:{"$addToSet" : "$MemoryUsed.Datapoints.Unit"},
               SwapUtilization:{"$avg" : "$SwapUtilization.Datapoints.Average"},
               SwapUtilizationUnit:{"$addToSet" : "$SwapUtilization.Datapoints.Unit"},
               MemoryAvailable:{"$avg" : "$MemoryAvailable.Datapoints.Average"},
               MemoryAvailableUnit:{"$addToSet" : "$MemoryAvailable.Datapoints.Unit"},
               MemoryUtilization:{"$avg" : "$MemoryUtilization.Datapoints.Average"},
               MemoryUtilizationUnit: {"$addToSet" : "$MemoryUtilization.Datapoints.Unit"},
               SwapUsed:{"$avg" : "$SwapUsed.Datapoints.Average"},
               SwapUsedUnit: {"$addToSet" : "$SwapUsed.Datapoints.Unit"}
               }  
    },
        {
            $project : { _id:1 , 
              DiskSpaceAvailable:1 , 
              DiskSpaceAvailableUnit : 1,
              DiskSpaceUtilization : 1,
              DiskSpaceUtilizationUnit : 1,
              DiskSpaceUsed : 1,
              DiskSpaceUsedUnit : 1,
              MemoryUsed :1,
              MemoryUsedUnit:1,
              SwapUtilization:1,
              SwapUtilizationUnit:1,
              MemoryAvailable:1,
              MemoryAvailableUnit:1,
              MemoryUtilization:1,
              MemoryUtilizationUnit: 1,
              SwapUsed:1,
              SwapUsedUnit:1
              }
        }
    ]);

This query does not return and runs indefinitely, I have tried with top 4 unwind operators it works takes about 3-4 seconds but after adding in the 5th unwind operator the query goes for a toss and does not return. I am sure I am doing something wrong but unable to put a finger on it, can someone please point out if I am making a mistake.

Any kind of suggestions are most welcome, I am willing to change the schema as well.

Thank you :)

That's hell lot of data in a single document. Unwinding this many nested docs and computing the average for the same not only adds to the response time but also to the consumed resources!

To make your aggregate query subsequently fast, I insist you should try doing an average while inserting the doc instead of doing it on retrieval.

Eg - While adding the 1st doc (the average is 5), the overall average of DiskSpaceAvailable will be 5 & when second sub doc is added (with average 2), the total average is computed as 5+2/2 = 3.5.

The data design will be something like :-

{
    "_id" : ObjectId("585a42b5b7e79d1c0c533f1f"),
    "instanceId" : "i-b385a9bd",
    "DiskSpaceAvailableUnit": "Gigabytes",
    "DiskSpaceAvailableAverage": <The computed average value>,
    "DiskSpaceAvailable" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:14:00.000Z"),
                "Average" : 4.32112884521484,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:32:00.000Z"),
                "Average" : 4.32107543945312,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:50:00.000Z"),
                "Average" : 4.32101821899414,
                "Unit" : "Gigabytes"
            }
        ]
    },
    ....
}

So, you will just have to fetch the data without doing any kind of computation & the response will also be quite fast(much less compared to your current response time).

Though, Such a structure will subsequently increase the computation times & complexity for inserts/updates. But if faster retrieval is of prime importance then you should take this structure into consideration.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM