MongoDb: How to get a field (sub document) from a document?

Question

Consider this example collection:

 {
    "_id:"0,
    "firstname":"Tom",
    "children" : {
                    "childA":{
                                "toys":{
                                        'toy 1':'batman',
                                        'toy 2':'car',
                                        'toy 3':'train',
                                        }
                                "movies": {
                                        'movie 1': "Ironman"
                                        'movie 2': "Deathwish"
                                        }
                                },
                    "childB":{
                                "toys":{
                                        'toy 1':'doll',
                                        'toy 2':'bike',
                                        'toy 3':'xbox',
                                        }
                                "movies": {
                                        'movie 1': "Frozen"
                                        'movie 2': "Barbie"
                                        }
                                }
                    }
}

Now I would like to retrieve ONLY the movies from a particular document.

I have tried something like this:

movies = users.find_one({'_id': 0}, {'_id': 0, 'children.ChildA.movies': 1})

However, I get the whole field structure from 'children' down to 'movies' and it's content. How do I just do a query and retrieve only the content of 'movies'?

To be specific I want to end up with this:

                                       {
                                        'movie 1': "Frozen"
                                        'movie 2': "Barbie"
                                        }

Answer 1

The problem here is your current data structure is not really great for querying. This is mostly because you are using "keys" to actually represent "data points", and while it might initially seem to be a logical idea it is actually a very bad practice.

So rather than do something like assign "childA" and "childB" as keys of an object or "sub-document", you are better off assigning these are "values" to a generic key name in a structure like this:

 {
    "_id:"0,
    "firstname":"Tom",
    "children" : [
        { 
            "name": "childA", 
            "toys": [
                "batman",
                "car",
                "train"
            ],
            "movies": [
                "Ironman"
                "Deathwish"
            ]
        },
        {
            "name": "childB",
            "toys": [
                "doll",
                "bike",
                "xbox",
            ],
            "movies": [
                "Frozen",
                "Barbie"
            ]
        }
    ]
}

Not the best as there are nested arrays, which can be a potential problem but there are workarounds to this as well ( but later ), but the main point here is this is a lot better than defining the data in "keys". And the main problem with "keys" that are not consistently named is that MongoDB does not generally allow any way to "wildcard" these names, so you are stuck with naming and "absolute path" in order to access elements as in:

children -> childA -> toys
children -> childB -> toys

And that in a nutshell is bad , and compared to this:

"children.toys"

From the sample prepared above, then I would say that is a whole lot better approach to organizing your data.

Even so, just getting back something such as a "unique list of movies" is out of scope for standard .find() type queries in MongoDB. This actually requires something more of "document manipulation" and is well supported in the aggregation framework for MongoDB. This has extensive capabilities for manipulation that is not present in the query methods, and as a per document response with the above structure then you can do this:

db.collection.aggregate([
    # De-normalize the array content first
    { "$unwind": "$children" },

    # De-normalize the content from the inner array as well
    { "$unwind": "$children.movies" },

    # Group back, well optionally, but just the "movies" per document
    { "$group": {
        "_id": "$_id",
        "movies": { "$addToSet": "$children.movies" }
    }}
])

So now the "list" response in the document only contains the "unique" movies, which corresponds more to what you are asking. Alternately you could just $push instead and make a "non-unique" list. But stupidly that is actually the same as this:

db.collection.find({},{ "_id": False, "children.movies": True })

As a "collection wide" concept, then you could simplify this a lot by simply using the .distinct() method. Which basically forms a list of "distinct" keys based on the input you provide. This playes with arrays really well:

db.collection.distinct("children.toys")

And that is essentially a collection wide analysis of all the "distinct" occurrences for each"toys" value in the collection, and returned as a simple "array".

But as for you existing structure, it deserves a solution to explain, but you really must understand that the explanation is horrible. The problem here is that the "native" and optimized methods available to general queries and aggregation methods are not available at all and the only option available is JavaScript based processing. Which even though a little better through "v8" engine integration, is still really a complete slouch when compared side by side with native code methods.

So from the "original" form that you have, ( JavaScript form, functions have to be so easy to translate") :

 db.collection.mapReduce(
     // Mapper
     function() {
         var id this._id;
             children = this.children;

         Object.keys(children).forEach(function(child) {
             Object.keys(child).forEach(function(childKey) {
                 Object.keys(childKey).forEach(function(toy) {
                     emit(
                         id, { "toys": [children[childkey]["toys"][toy]] }
                     );
                 });
             });
         });
     },
     // Reducer
     function(key,values) {
         var output = { "toys": [] };

         values.forEach(function(value) {
             value.toys.forEach(function(toy) {
                 if ( ouput.toys.indexOf( toy ) == -1 )
                     output.toys.push( toy );
             });
         });
     },
     {
         "out": { "inline": 1 }
     }
)

So JavaScript evaluation is the "horrible" approach as this is much slower in execution, and you see the "traversing" code that needs to be implemented. Bad news for performance, so don't do it. Change the structure instead.

As a final part, you could model this differently to avoid the "nested array" concept. And understand that the only real problem with a "nested array" is that "updating" a nested element is really impossible without reading in the whole document and modifying it.

So $push and $pull methods work fine. But using a "positional" $ operator just does not work as the "outer" array index is always the "first" matched element. So if this really was a problem for you then you could do something like this, for example:

 {
    "_id:"0,
    "firstname":"Tom",
    "childtoys" : [
        { 
            "name": "childA", 
            "toy": "batman"
        }.
        { 
            "name": "childA",
            "toy": "car"
        },
        {
            "name": "childA",
            "toy": "train"
        },
        {
            "name": "childB",
            "toy": "doll"
        },
        {
            "name": "childB",
            "toy": "bike"
        },
        {
            "name": "childB",
            "toy": "xbox"
        }
    ],
    "childMovies": [
        {
             "name": "childA"
             "movie": "Ironman"
       },
       {
            "name": "childA",
            "movie": "Deathwish"
       },
       {
            "name": "childB",
            "movie": "Frozen"
       },
       {
            "name": "childB",
            "movie": "Barbie"
       }
  ]
}

That would be one way to avoid the problem with nested updates if you did indeed need to "update" items on a regular basis rather than just $push and $pull items to the "toys" and "movies" arrays.

But the overall message here is to design your data around the access patterns you actually use. MongoDB does generally not like things with a "strict path" in the terms of being able to query or otherwise flexibly issue updates.

Answer 2

Projections in MongoDB make use of '1' and '0' , not 'True'/'False'. Moreover ensure that the fields are specified in the right cases(uppercase/lowercase)

The query should be as below:

db.users.findOne({'_id': 0}, {'_id': 0, 'children.childA.movies': 1})

Which will result in :

{
    "children" : {
        "childA" : {
            "movies" : {
                "movie 1" : "Ironman",
                "movie 2" : "Deathwish"
            }
        }
    }
}

MongoDb: How to get a field (sub document) from a document?

Question

2 answers

solution1
3 ACCPTED 2014-09-19 00:14:01

solution2
0 2014-09-18 13:14:31

MongoDb: How to get a field (sub document) from a document?

Question

2 answers

solution1 3 ACCPTED 2014-09-19 00:14:01

solution2 0 2014-09-18 13:14:31

solution1
3 ACCPTED 2014-09-19 00:14:01

solution2
0 2014-09-18 13:14:31