Consequences of using $unwind on nested arrays?

Question

Say I have 17,000 documents that have a structure similar to the document below:

{
   someInfo: "blah blah blah",
   // and another dozen or so attributes here, followed by:
   answers:[
      {
          email: "test@test.com,
          values:[
             {value: 1, label: "test1"},
             {value: 2, label: "test2"}    
          ]
      },
      {
          email: "someone@somewhere.com,
          values:[
             {value: 6, label: "test1"},
             {value: 1, label: "test2"}    
          ]
      }
   ]
}

Say I use aggregate to unwind both answers and answers.values like so:

db.participants.aggregate(
   {$unwind: "$answers"},
   {$unwind: "$answers.values"}
);

I assume it would create a fairly large result set in memory since it would essentially be replicating the parent object 17,000 * # of answers * # of values times.

I have been testing a query that does something similar on a development environment and the performance of the query itself is fine, but I'm wondering if I should be concerned about running this on a production environment where the unwound result set could potentially eat up a lot of memory. Mongo's documentation on $unwind goes into how it works, but does not discuss potential performance problems.

Should I be worried about doing this on a production system? Will it slow down other queries against the db?

Answer 1

It is always a good idea to be cognizant of memory resources when $unwind ing because of the replication of data that occurs.

Using $match to narrow down the results to the specific documents you are looking for is of course one way to reduce the amount of memory necessary to hold the returned data.

Another way to reduce the memory footprint is with $project . $project allows you to re-organize the documents in the pipeline so that you only return the elements in which you are interested.

To use your example,

{
  someInfo: "blah blah blah",
  answers: [
    {
      email: "test@test.com",
      values: [
        {value: 1, label: "test1"},
        {value: 2, label: "test2"}    
      ]
    },
    {
      email: "someone@somewhere.com",
      values: [
        {value: 6, label: "test1"},
        {value: 1, label: "test2"}    
      ]
    }
  ]
}

With

db.collection.aggregate([{ $match: { <element>: <value> }}, { $project: { _id: 0, answers: 1}}])

will remove the someInfo and other attributes you may not be interested in. Then you could $project again after unwinding...

db.collection.aggregate([
   { $match: { <element>: <value> }},
   { $project: { _id: 0, answers: 1}},
   { $unwind: "$answers"},
   { $unwind: "$answers.tags"},
   { $project: { e: "$answers.email", v: "$answers.values"}}
])

will return fairly compact results like:

{ e: "test@test.com", v: { value: 1, label: "test1" } }
{ e: "test@test.com", v: { value: 2, label: "test2" } }
{ e: "someone@somewhere.com", v: { value: 6, label: "test1" } }
{ e: "someone@somewhere.com", v: { value: 1, label: "test2" } }

Although the single letter attribute names reduce human-readability, it does cut down on the size of the data that is inflated by lengthy repeated attribute names.

Consequences of using $unwind on nested arrays?

Question

1 answers

solution1
4 ACCPTED 2014-12-29 21:52:07

Consequences of using $unwind on nested arrays?

Question

1 answers

solution1 4 ACCPTED 2014-12-29 21:52:07

solution1
4 ACCPTED 2014-12-29 21:52:07