简体   繁体   中英

MongoDB : Map Reduce : Create one sub-document from another one

I have a mongodb collection which has documents like this :

{
"_id" : ObjectId("safdsd435tdg54trgds"),
"startDate" : ISODate("2013-07-02T17:35:01.000Z"),
"endDate" : ISODate("2013-08-02T17:35:01.000Z"),
"active" : true,
"channels" : [ 
    1, 2, 3, 4
],

}

I want to convert this to something like this :

{
"_id" : ObjectId("safdsd435tdg54trgds"),
"startDate" : ISODate("2013-07-02T17:35:01.000Z"),
"endDate" : ISODate("2013-08-02T17:35:01.000Z"),
"active" : true,
"channels" : [ 
    1, 2, 3, 4
],
"tags" :[ 
            {
                "name": one
                "type": channel
            },
            {
                "name": two
                "type": channel
            },
            {
                "name": three
                "type": channel
            },
            {
                "name": four
                "type": channel
            }
        ]           
}

I already have a mapping of what 1,2,3,4 mean. Just for the sake of simplicity I put them as their alphabetical format. the values could be different, but they're static mappings.

You seem to be trying to do this update without a big iteration of your collection, So you "could" do this with mapReduce, albeit in a very "mapReduce way" as it has it's own way of doing things.

So first you want to define a mapper that encapsulates your current document :

var mapFunction = function (){

    var key = this._id;

    var value = {
       startDate: this.startDate,
       endDate: this.endDate,
       active: this.active,
       channels: this.channels

    };

    emit( key, value );
};

Now here the reducer is actually not going to be called as all the keys from the mapper will be unique, being of course the _id values from the original document. But to make the call happy:

var reduceFunction = function(){};

As this is a one to one thing this will go to finalize . It could be in the mapper , but for cleanliness sake

var finalizeFunction = function (key, reducedValue) {

    var tags = [
        { name: "one", type: "channel" },
        { name: "two", type: "channel" },
        { name: "three", type: "channel" },
        { name: "four", type: "channel" }
    ];

    reducedValue.tags = [];

    reducedValue.channels.forEach(function(channel) {
        reducedValue.tags.push( tags[ channel -1 ] );
    });

    return reducedValue;

};

Then call the mapReduce:

 db.docs.mapReduce( 
     mapFunction,
     reduceFunction,
    { 
        out: { replace: "newdocs" },
        finalize: finalizeFunction 
    }
 )

So that will output to a new collection , but in the way that mapReduce does it so you have this:

{
    "_id" : ObjectId("53112b2d0ceb66905ae41259"),
    "value" : {
            "startDate" : ISODate("2013-07-02T17:35:01Z"),
            "endDate" : ISODate("2013-08-02T17:35:01Z"),
            "active" : true,
            "channels" : [ 1, 2, 3, 4 ],
            "tags" : [
                    {
                        "name" : "one",
                        "type" : "channel"
                    },
                    {
                        "name" : "two",
                        "type" : "channel"
                    },
                    {
                        "name" : "three",
                        "type" : "channel"
                    },
                    {
                        "name" : "four",
                        "type" : "channel"
                    }
            ]
    }
}

So all your document fields other than _id are stuck under that value field, so that's not the document that you want. But that is how mapReduce works.

If you really need to get out of jail from this and are willing to wait a bit, the upcoming 2.6 release has added an $out pipeline stage. So you "could" transform the documents in your new collection with $project like this:

db.newdocs.aggregate([

    // Transform the document
    {"$project": { 
        "startDate": "$value.startDate",
        "endDate":   "$value.endDate",
        "active":    "$value.active",
        "channels":  "$value.channels",
        "tags":      "$value.tags"
    }},

    // Output to new collection
    {"$out": "fixeddocs" }

])

So that will be right. But of course this is not your original collection. So to back to that state you are going to have to .drop() collections and use .renameCollection() :

db.newdocs.drop();

db.docs.drop();

db.fixeddocs.renameCollection("docs");  

Now please READ the documentation carefully on this, there are several limitations, and of course you would have to re-create indexes as well.

All of this, and in particular the last stage is going to result in a lot of disk thrashing and also keep in mind that you are dropping collections here. It almost certainly is a case for taking access to your database off-line while this is performed.

And even as such the dangers here are real enough that perhaps you can just live with running an iterative loop to update the documents, using arbitrary JavaScript. And if you really must have to do so, you could always do that using db.eval() to have that all execute on the server. But if you do, then please read the documentation for that very carefully as well.

But for completeness even if I'm not advocating this:

db.eval(function(){

    db.docs.find().forEach(function(document) {

        var tags = [
            { name: "one", type: "channel" },
            { name: "two", type: "channel" },
            { name: "three", type: "channel" },
            { name: "four", type: "channel" }
        ];

        document.tags = [];

        document.channels.forEach(function(channel) {
             document.tags.push( tags[ channel -1 ] );
        });

        var id = document._id;
        delete document._id;           

        db.docs.update({ "_id": id },document);

    });

})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM