简体   繁体   中英

How to query all subdocuments

I'm beginning with MongoDb and nodejs (using mongoose).

I have a collection of Stories , each of them can have one or more Tags , it's something like that:

{
    title: "The red fox",
    content: "The red fox jumps away...",
    tags: [
        {
            tagname: "fairytale",
            user: "pippo"
        },
        {
            tagname: "funny",
            user: "pluto"
        },
        {
            tagname: "fox",
            user: "paperino"
        }
    ]
},

... other stories

Now I want to make a tag cloud .

It means querying Stories for all tags.

In a relational world (eg MySQL) I would have a Stories table, a Tags table and a Stories_Tags table (many-to-many). Then I'd query on the tags table or something like that.

Is there a way to do so? (I'm sure yes)

If yes, is it a good practice? Or does it break the nosql paradigm?

Can you imagine a better way for my schema design?

Here is how you do this using the aggregation framework (you need to use just released 2.2).

db.stories.aggregate(
[
    {
        "$unwind" : "$tags"
    },
    {
        "$group" : {
            "_id" : "$tags.tagname",
            "total" : {
                "$sum" : 1
            }
        }
    },
    {
        "$sort" : {
            "total" : -1
        }
    }
])

Your result will look like this:

{
    "result" : [
        {
            "_id" : "fairytale",
            "total" : 3
        },
        {
            "_id" : "funny",
            "total" : 2
        },
        {
            "_id" : "silly",
            "total" : 1
        },
        {
            "_id" : "fox",
            "total" : 1
        }
    ],
    "ok" : 1
}

Welcome to Mongo

The best "Schema" for your data will something like this.

You create a collection called stories, each story will be a document in this collection. You can then easily query your data with something like.

db.stories.find({ "tags.tagname": "fairytale"}); // will find all documents that have fairytale as a tagname.

UPDATE

db.stories.find({ "tags.tagname": { $exists : true }}); // will find all documents that have a tagname.

Notice the dot notation in the find query, that's how you reach into arrays/objects in mongo.

You can use an MR to accomplish this. In an MR you would simply pick out the tags and project them:

var map = function(){
     for(var i=0;i<this.tags.length;i++){
         emit(this.tags[i].tagname, {count: 1});
     }
}

And then your reduce would run through the emitted documents basically summing up the amount of times that tag was seen.

If you upgrade to the lastest unstable 2.2 you can also use the aggregation framework. You would use the $project and $sum piplines of the aggregation framework to project the tags out of each post and then summing them up to create a score based tag cloud allowing you size the text of each tag based upon the summing.

If yes, is it a good practice? Or does it break the nosql paradigm?

This is a pretty standard problem in MongoDB and one you won't get away from. With the reusable structure comes the inevitable need to do some complex querying over it. Fortunately in 2.2 there is the aggregationm framework to save.

As to whether this is a good or bad approach, it is a pretty standard one as such it is neither good or bad.

As to making the structure better, you could pre-aggregate unique tags with their count to a separate collection. This would make it easier to build your tag cloud in realtime.

Pre-aggregation is a form of creating the other collection you would normally get from an MR without the need to use MRs or the aggregation framework. It is normally event based upon your app, so when a user create a post or retags a post it will trigger a pre-aggregation event to a collection of "tag_count" which looks like:

{
    _id: {},
    tagname: "",
    count: 1
}

When the event is triggered your app will loop through the tags on the post basically doing $inc upserts like so:

db.tag_count.update({tagname: 'whoop'}, {$inc: {count: 1}}, true);

And so you will now have a collection of tags with their count throughout your blog. From there you go the same route as the MR did and just query this collection getting out your data. You would of course need to handle deletion and update events but you get the general idea.

Well, there are different ways. And I think there is no difference between your solution and this one .

And also you can copy&paste its map_reduce method to output tag-count hash.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM