简体   繁体   English

MongoDB:Map Reduce:从另一个创建一个子文档

[英]MongoDB : Map Reduce : Create one sub-document from another one

I have a mongodb collection which has documents like this : 我有一个mongodb集合,其中包含以下文档:

{
"_id" : ObjectId("safdsd435tdg54trgds"),
"startDate" : ISODate("2013-07-02T17:35:01.000Z"),
"endDate" : ISODate("2013-08-02T17:35:01.000Z"),
"active" : true,
"channels" : [ 
    1, 2, 3, 4
],

} }

I want to convert this to something like this : 我想把它转换成这样的东西:

{
"_id" : ObjectId("safdsd435tdg54trgds"),
"startDate" : ISODate("2013-07-02T17:35:01.000Z"),
"endDate" : ISODate("2013-08-02T17:35:01.000Z"),
"active" : true,
"channels" : [ 
    1, 2, 3, 4
],
"tags" :[ 
            {
                "name": one
                "type": channel
            },
            {
                "name": two
                "type": channel
            },
            {
                "name": three
                "type": channel
            },
            {
                "name": four
                "type": channel
            }
        ]           
}

I already have a mapping of what 1,2,3,4 mean. 我已经有1,2,3,4意思的映射。 Just for the sake of simplicity I put them as their alphabetical format. 为了简单起见,我把它们作为字母格式。 the values could be different, but they're static mappings. 值可能不同,但它们是静态映射。

You seem to be trying to do this update without a big iteration of your collection, So you "could" do this with mapReduce, albeit in a very "mapReduce way" as it has it's own way of doing things. 你似乎试图在没有大量迭代的情况下进行这样的更新,所以你“可以”用mapReduce做到这一点,虽然它是一种非常“mapReduce方式”,因为它拥有自己的做事方式。

So first you want to define a mapper that encapsulates your current document : 首先,您要定义封装当前文档的映射器

var mapFunction = function (){

    var key = this._id;

    var value = {
       startDate: this.startDate,
       endDate: this.endDate,
       active: this.active,
       channels: this.channels

    };

    emit( key, value );
};

Now here the reducer is actually not going to be called as all the keys from the mapper will be unique, being of course the _id values from the original document. 现在这里的reducer实际上不会被调用,因为mapper中的所有键都是唯一的,当然是原始文档中的_id值。 But to make the call happy: 但要让电话快乐:

var reduceFunction = function(){};

As this is a one to one thing this will go to finalize . 由于这是一对一的事情,这将最终确定 It could be in the mapper , but for cleanliness sake 它可能在映射器中 ,但为了清洁起见

var finalizeFunction = function (key, reducedValue) {

    var tags = [
        { name: "one", type: "channel" },
        { name: "two", type: "channel" },
        { name: "three", type: "channel" },
        { name: "four", type: "channel" }
    ];

    reducedValue.tags = [];

    reducedValue.channels.forEach(function(channel) {
        reducedValue.tags.push( tags[ channel -1 ] );
    });

    return reducedValue;

};

Then call the mapReduce: 然后调用mapReduce:

 db.docs.mapReduce( 
     mapFunction,
     reduceFunction,
    { 
        out: { replace: "newdocs" },
        finalize: finalizeFunction 
    }
 )

So that will output to a new collection , but in the way that mapReduce does it so you have this: 所以这将输出到一个新的集合 ,但是mapReduce的方式是这样的,所以你有这个:

{
    "_id" : ObjectId("53112b2d0ceb66905ae41259"),
    "value" : {
            "startDate" : ISODate("2013-07-02T17:35:01Z"),
            "endDate" : ISODate("2013-08-02T17:35:01Z"),
            "active" : true,
            "channels" : [ 1, 2, 3, 4 ],
            "tags" : [
                    {
                        "name" : "one",
                        "type" : "channel"
                    },
                    {
                        "name" : "two",
                        "type" : "channel"
                    },
                    {
                        "name" : "three",
                        "type" : "channel"
                    },
                    {
                        "name" : "four",
                        "type" : "channel"
                    }
            ]
    }
}

So all your document fields other than _id are stuck under that value field, so that's not the document that you want. 因此除了_id之外的所有文档字段都被置于该value字段下,因此这不是您想要的文档。 But that is how mapReduce works. 但是,这是MapReduce的工作方式。

If you really need to get out of jail from this and are willing to wait a bit, the upcoming 2.6 release has added an $out pipeline stage. 如果你真的需要从这个监狱出来并且愿意等一下,那么即将发布的2.6版本已经增加了一个$out流水线阶段。 So you "could" transform the documents in your new collection with $project like this: 因此,您可以使用$project转换新集合中的文档,如下所示:

db.newdocs.aggregate([

    // Transform the document
    {"$project": { 
        "startDate": "$value.startDate",
        "endDate":   "$value.endDate",
        "active":    "$value.active",
        "channels":  "$value.channels",
        "tags":      "$value.tags"
    }},

    // Output to new collection
    {"$out": "fixeddocs" }

])

So that will be right. 所以这是对的。 But of course this is not your original collection. 但当然这不是你的原创系列。 So to back to that state you are going to have to .drop() collections and use .renameCollection() : 因此,要返回到该状态,您将需要.drop()集合并使用.renameCollection()

db.newdocs.drop();

db.docs.drop();

db.fixeddocs.renameCollection("docs");  

Now please READ the documentation carefully on this, there are several limitations, and of course you would have to re-create indexes as well. 现在请仔细阅读文档,有几个限制,当然你也必须重新创建索引。

All of this, and in particular the last stage is going to result in a lot of disk thrashing and also keep in mind that you are dropping collections here. 所有这一切,特别是最后一个阶段将导致大量的磁盘颠簸,并记住你在这里丢弃集合。 It almost certainly is a case for taking access to your database off-line while this is performed. 几乎可以肯定的是,在执行此操作时脱机访问您的数据库。

And even as such the dangers here are real enough that perhaps you can just live with running an iterative loop to update the documents, using arbitrary JavaScript. 即便如此,这里的危险也足够真实,也许您可​​以使用任意JavaScript来运行迭代循环来更新文档。 And if you really must have to do so, you could always do that using db.eval() to have that all execute on the server. 如果你真的必须这样做,你总是可以使用db.eval()来完成所有这一切在服务器上执行。 But if you do, then please read the documentation for that very carefully as well. 如果你这样做,那么仔细阅读文档

But for completeness even if I'm not advocating this: 但是为了完整性,即使我不提倡这个:

db.eval(function(){

    db.docs.find().forEach(function(document) {

        var tags = [
            { name: "one", type: "channel" },
            { name: "two", type: "channel" },
            { name: "three", type: "channel" },
            { name: "four", type: "channel" }
        ];

        document.tags = [];

        document.channels.forEach(function(channel) {
             document.tags.push( tags[ channel -1 ] );
        });

        var id = document._id;
        delete document._id;           

        db.docs.update({ "_id": id },document);

    });

})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM