简体   繁体   中英

Consequences of using $unwind on nested arrays?

Say I have 17,000 documents that have a structure similar to the document below:

{
   someInfo: "blah blah blah",
   // and another dozen or so attributes here, followed by:
   answers:[
      {
          email: "test@test.com,
          values:[
             {value: 1, label: "test1"},
             {value: 2, label: "test2"}    
          ]
      },
      {
          email: "someone@somewhere.com,
          values:[
             {value: 6, label: "test1"},
             {value: 1, label: "test2"}    
          ]
      }
   ]
}

Say I use aggregate to unwind both answers and answers.values like so:

db.participants.aggregate(
   {$unwind: "$answers"},
   {$unwind: "$answers.values"}
);

I assume it would create a fairly large result set in memory since it would essentially be replicating the parent object 17,000 * # of answers * # of values times.

I have been testing a query that does something similar on a development environment and the performance of the query itself is fine, but I'm wondering if I should be concerned about running this on a production environment where the unwound result set could potentially eat up a lot of memory. Mongo's documentation on $unwind goes into how it works, but does not discuss potential performance problems.

Should I be worried about doing this on a production system? Will it slow down other queries against the db?

It is always a good idea to be cognizant of memory resources when $unwind ing because of the replication of data that occurs.

Using $match to narrow down the results to the specific documents you are looking for is of course one way to reduce the amount of memory necessary to hold the returned data.

Another way to reduce the memory footprint is with $project . $project allows you to re-organize the documents in the pipeline so that you only return the elements in which you are interested.

To use your example,

{
  someInfo: "blah blah blah",
  answers: [
    {
      email: "test@test.com",
      values: [
        {value: 1, label: "test1"},
        {value: 2, label: "test2"}    
      ]
    },
    {
      email: "someone@somewhere.com",
      values: [
        {value: 6, label: "test1"},
        {value: 1, label: "test2"}    
      ]
    }
  ]
}

With

db.collection.aggregate([{ $match: { <element>: <value> }}, { $project: { _id: 0, answers: 1}}])

will remove the someInfo and other attributes you may not be interested in. Then you could $project again after unwinding...

db.collection.aggregate([
   { $match: { <element>: <value> }},
   { $project: { _id: 0, answers: 1}},
   { $unwind: "$answers"},
   { $unwind: "$answers.tags"},
   { $project: { e: "$answers.email", v: "$answers.values"}}
])

will return fairly compact results like:

{ e: "test@test.com", v: { value: 1, label: "test1" } }
{ e: "test@test.com", v: { value: 2, label: "test2" } }
{ e: "someone@somewhere.com", v: { value: 6, label: "test1" } }
{ e: "someone@somewhere.com", v: { value: 1, label: "test2" } }

Although the single letter attribute names reduce human-readability, it does cut down on the size of the data that is inflated by lengthy repeated attribute names.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM