Using unwind in multiple nested arrays in mongodb

Question

I have stored objects in my mongodb(version 3.2) collection in the following schema,

{
    "_id" : ObjectId("585a42b5b7e79d1c0c533f1f"),
    "instanceId" : "i-b385a9bd",
    "DiskSpaceAvailable" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:14:00.000Z"),
                "Average" : 4.32112884521484,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:32:00.000Z"),
                "Average" : 4.32107543945312,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:50:00.000Z"),
                "Average" : 4.32101821899414,
                "Unit" : "Gigabytes"
            }
        ]
    },
    "DiskSpaceUsed" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:14:00.000Z"),
                "Average" : 3.33073806762695,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:32:00.000Z"),
                "Average" : 3.33079147338867,
                "Unit" : "Gigabytes"
            }
        ]
    },
    "MemoryUsed" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:14:00.000Z"),
                "Average" : 0.753532409667969,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:32:00.000Z"),
                "Average" : 0.753063201904297,
                "Unit" : "Gigabytes"
            }
        ]
    },
    "MemoryUtilization" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:18:00.000Z"),
                "Average" : 19.5049320125989,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:36:00.000Z"),
                "Average" : 19.5078950721357,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:54:00.000Z"),
                "Average" : 19.5068086169722,
                "Unit" : "Percent"
            }
        ]
    },
    "DiskSpaceUtilization" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:18:00.000Z"),
                "Average" : 42.9914921714092,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:36:00.000Z"),
                "Average" : 42.9921815029693,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:54:00.000Z"),
                "Average" : 42.992920072498,
                "Unit" : "Percent"
            }
        ]
    },
    "SwapUtilization" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:18:00.000Z"),
                "Average" : 0,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:36:00.000Z"),
                "Average" : 0,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:54:00.000Z"),
                "Average" : 0,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T13:12:00.000Z"),
                "Average" : 0,
                "Unit" : "Percent"
            }
        ]
    },
    "SwapUsed" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T13:06:00.000Z"),
                "Average" : 0,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T13:24:00.000Z"),
                "Average" : 0,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:36:00.000Z"),
                "Average" : 0,
                "Unit" : "Gigabytes"
            }
        ]
    },
    "MemoryAvailable" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:14:00.000Z"),
                "Average" : 3.10872268676758,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:32:00.000Z"),
                "Average" : 3.10919189453125,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:50:00.000Z"),
                "Average" : 3.10895538330078,
                "Unit" : "Gigabytes"
            }
        ]
    }
}

I am trying to use mongodb aggregate and following is my query

db.collectionSchema.aggregate([
    {
     $match :{ "instanceId" : "i-b385a9bd" }
    },
    {
      $unwind : "$DiskSpaceAvailable.Datapoints"   
    },
     {
      $unwind : "$DiskSpaceUtilization.Datapoints"   
    },
    {
      $unwind : "$DiskSpaceUsed.Datapoints"   
    },
    {
      $unwind : "$MemoryUsed.Datapoints"   
    },
    {
      $unwind : "$SwapUtilization.Datapoints"   
    },
    {
      $unwind : "$MemoryAvailable.Datapoints"   
    },
    {
      $unwind : "$MemoryUtilization.Datapoints"   
    },
    {
      $unwind : "$SwapUsed.Datapoints"   
    },
    {
      $group : { _id : "$instanceId" , 
               DiskSpaceAvailable : { "$avg" : "$DiskSpaceAvailable.Datapoints.Average" } , 
               DiskSpaceAvailableUnit : { "$addToSet" : "$DiskSpaceAvailable.Datapoints.Unit" },
               DiskSpaceUtilization : {"$avg" : "$DiskSpaceUtilization.Datapoints.Average"},
               DiskSpaceUtilizationUnit : {"$addToSet" : "$DiskSpaceUtilization.Datapoints.Unit"},
               DiskSpaceUsed : {"$avg" : "$DiskSpaceUsed.Datapoints.Average"},
               DiskSpaceUsedUnit : {"$addToSet" : "$DiskSpaceUsed.Datapoints.Unit"},
               MemoryUsed :{"$avg" : "$MemoryUsed.Datapoints.Average"},
               MemoryUsedUnit:{"$addToSet" : "$MemoryUsed.Datapoints.Unit"},
               SwapUtilization:{"$avg" : "$SwapUtilization.Datapoints.Average"},
               SwapUtilizationUnit:{"$addToSet" : "$SwapUtilization.Datapoints.Unit"},
               MemoryAvailable:{"$avg" : "$MemoryAvailable.Datapoints.Average"},
               MemoryAvailableUnit:{"$addToSet" : "$MemoryAvailable.Datapoints.Unit"},
               MemoryUtilization:{"$avg" : "$MemoryUtilization.Datapoints.Average"},
               MemoryUtilizationUnit: {"$addToSet" : "$MemoryUtilization.Datapoints.Unit"},
               SwapUsed:{"$avg" : "$SwapUsed.Datapoints.Average"},
               SwapUsedUnit: {"$addToSet" : "$SwapUsed.Datapoints.Unit"}
               }  
    },
        {
            $project : { _id:1 , 
              DiskSpaceAvailable:1 , 
              DiskSpaceAvailableUnit : 1,
              DiskSpaceUtilization : 1,
              DiskSpaceUtilizationUnit : 1,
              DiskSpaceUsed : 1,
              DiskSpaceUsedUnit : 1,
              MemoryUsed :1,
              MemoryUsedUnit:1,
              SwapUtilization:1,
              SwapUtilizationUnit:1,
              MemoryAvailable:1,
              MemoryAvailableUnit:1,
              MemoryUtilization:1,
              MemoryUtilizationUnit: 1,
              SwapUsed:1,
              SwapUsedUnit:1
              }
        }
    ]);

This query does not return and runs indefinitely, I have tried with top 4 unwind operators it works takes about 3-4 seconds but after adding in the 5th unwind operator the query goes for a toss and does not return. I am sure I am doing something wrong but unable to put a finger on it, can someone please point out if I am making a mistake.

Any kind of suggestions are most welcome, I am willing to change the schema as well.

Thank you :)

Answer 1

That's hell lot of data in a single document. Unwinding this many nested docs and computing the average for the same not only adds to the response time but also to the consumed resources!

To make your aggregate query subsequently fast, I insist you should try doing an average while inserting the doc instead of doing it on retrieval.

Eg - While adding the 1st doc (the average is 5), the overall average of DiskSpaceAvailable will be 5 & when second sub doc is added (with average 2), the total average is computed as 5+2/2 = 3.5.

The data design will be something like :-

{
    "_id" : ObjectId("585a42b5b7e79d1c0c533f1f"),
    "instanceId" : "i-b385a9bd",
    "DiskSpaceAvailableUnit": "Gigabytes",
    "DiskSpaceAvailableAverage": <The computed average value>,
    "DiskSpaceAvailable" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:14:00.000Z"),
                "Average" : 4.32112884521484,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:32:00.000Z"),
                "Average" : 4.32107543945312,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:50:00.000Z"),
                "Average" : 4.32101821899414,
                "Unit" : "Gigabytes"
            }
        ]
    },
    ....
}

So, you will just have to fetch the data without doing any kind of computation & the response will also be quite fast(much less compared to your current response time).

Though, Such a structure will subsequently increase the computation times & complexity for inserts/updates. But if faster retrieval is of prime importance then you should take this structure into consideration.

Using unwind in multiple nested arrays in mongodb

Question

1 answers

solution1
0 2017-02-17 11:03:06

Using unwind in multiple nested arrays in mongodb

Question

1 answers

solution1 0 2017-02-17 11:03:06

solution1
0 2017-02-17 11:03:06