简体   繁体   English

MongoDB Map减少新手(PHP)

[英]MongoDB Map Reduce newbie (PHP)

I'm new to the map reduce concept and even though I'm making some slow progress, I'm finding some issues that I need some help with. 我是地图减少概念的新手,即使我进展缓慢,我也发现了一些需要帮助的问题。

I have a simple collection consisting of an id, city and and destination, something like this: 我有一个简单的集合,包括id,city和destination,如下所示:

{ "_id" : "5230e7e00000000000000000", "city" : "Boston", "to" : "Chicago" },
{ "_id" : "523fe7e00000000000000000", "city" : "New York", "to" : "Miami" },
{ "_id" : "5240e1e00000000000000000", "city" : "Boston", "to" : "Miami" },
{ "_id" : "536fe4e00000000000000000", "city" : "Washington D.C.", "to" : "Boston" },
{ "_id" : "53ffe7e00000000000000000", "city" : "New York", "to" : "Boston" },
{ "_id" : "5740e1e00000000000000000", "city" : "Boston", "to" : "Miami" },
...

(Please do note that this data is just made up for example purposes) (请注意,此数据仅用于示例目的)

I'd like to group by city the destinations including a count: 我想按城市分组包括计数在内的目的地:

{ "city" : "Boston", values : [{"Chicago",1}, {"Miami",2}] }
{ "city" : "New York", values : [{"Miami",1}, {"Boston",1}] }
{ "city" : "Washington D.C.", values : [{"Boston", 1}] }

For this I'm starting to playing with this function to map: 为此,我开始玩这个函数来映射:

    function() {
        emit(this.city, this.to);
    }

which performs the expected grouping. 执行预期的分组。 My reduce function is this: 我的reduce函数是这样的:

    function(key, values) {
        var reduced = {"to":[]};

        for (var i in values) {
            var item = values[i];
            reduced.to.push(item);
        }

        return reduced;
    }

which gives somewhat an expected output: 这给出了一些预期的输出:

{ "_id" : ObjectId("522f8a9181f01e671a853adb"), "value" : { "to" : [    "Boston", "Miami" ] } }
{ "_id" : ObjectId("522f933a81f01e671a853ade"), "value" : { "to" : [  "Chicago",  "Miami", "Miami" ] } }
{ "_id" : ObjectId("5231f0ed81f01e671a853ae0"), "value" : "Boston" }

As you can see, I still haven't counted the repeated cities, but as can be seen above, for some reason the last result in the output doesn't look good. 正如你所看到的,我仍然没有计算重复的城市,但从上面可以看出,由于某种原因,输出中的最后结果看起来并不好。 I'd expected it to be 我原以为是的

{ "_id" : ObjectId("5231f0ed81f01e671a853ae0"), "value" : { "to" : ["Boston"] } }

Has this anything to do with the fact that there is a single item? 这与有一个项目的事实有什么关系吗? Is there any way to obtain this? 有没有办法获得这个?

Thank you. 谢谢。

I see you are asking about a PHP issue, but you are using javascript to ask, so I'm assuming a javascript answer will help you move things along. 我看到你问的是PHP问题,但是你正在使用javascript来询问,所以我假设一个javascript答案会帮助你解决问题。 As such here is the javascript needed in the shell to run your aggregation. 因此,这里是shell运行聚合所需的javascript。 I strong suggest getting your aggregation working in the shell(or some other javascript editor) in general and then translating it into the language of your choice. 我强烈建议让你的聚合在shell(或其他一些javascript编辑器)中工作,然后将其翻译成你选择的语言。 It is a lot easier to see what is going on and there faster using this method. 使用此方法可以更快地查看正在发生的事情。 You can then run: 然后你可以运行:

use admin
db.runCommand( { setParameter: 1, logLevel: 2 } )

to check the bson output of your selected language vs what the shell looks like. 检查所选语言的bson输出与shell的外观。 This will appear in the terminal if mongo is in the foreground, otherwise you'll have ot look in the logs. 如果mongo位于前台,这将显示在终端中,否则您将查看日志。

Summing the routes in the aggregation framework [AF] with Mongo is fairly strait forward. 将汇总框架[AF]中的路线与Mongo相加是相当紧张的。 The AF is faster and easier to use then map reduce[MR]. AF更快更容易使用,然后map reduce [MR]。 Though in this case they both have similar issues, simply pushing to an array won't yield a count in and of itself (in MR you either need more logic in your reduce function or to use a finalize function). 虽然在这种情况下它们都有类似的问题,但只是推送到一个数组不会产生计数本身(在MR中,你需要在reduce函数中使用更多逻辑或使用finalize函数)。

With the AF using the example data provided this pipeline is useful: 使用AF提供的示例数据,此管道非常有用:

db.agg1.aggregate([
     {$group:{
         _id: { city: "$city", to: "$to" },  
         count: { $sum: 1 }
     }},
     {$group: {
         _id: "$_id.city",
         to:{ $push: {to: "$_id.to", count: "$count"}}
     }}
]);

The aggregation framework can only operate on known fields, but many pipeline operations so a problem needs to broken down with that as a consideration. 聚合框架只能在已知的字段上运行,但是许多管道操作都需要分解,因此需要将其作为一个考虑因素。 Above, the 1st stage calculates the numbers need, for which there are 3 fixed fields: the source, the destination, and the count. 在上面,第一阶段计算需要的数量,其中有3个固定字段:源,目的地和计数。 The second stage has 2 fixed fields, one of which is an array, which is only being pushed to (all the data for the final form is there). 第二阶段有2个固定字段,其中一个是一个数组,只是被推送到(最终形式的所有数据都在那里)。

For MR you can do this: 对于MR,您可以这样做:

var map = function() {
    var key = {source:this.city, dest:this.to};
    emit(key, 1);
};

var reduce = function(key, values) {
    return Array.sum(values);
};

A separate function will have to pretty it however. 然而,一个单独的功能必须漂亮。

If you have any additional questions please don't hesitate to ask. 如果您有任何其他问题,请不要犹豫。

Best, Charlie 最好,查理

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM