MongoDB中的MapReduce不输出

Question

I was trying to use MongoDB 2.4.3 (also tried 2.4.4) with mapReduce on a cluster with 2 shards with each 3 replicas. 我试图在一个集群上使用带有mapReduce的MongoDB 2.4.3（也尝试过2.4.4），每个集合包含2个分片。 I have a problem with results of the mapReduce job not being reduced into output collection. 我有一个问题，mapReduce作业的结果没有减少到输出集合。 I tried an Incremental Map Reduce . 我尝试了增量贴图减少。 I also tried "merging" instead of reducing, but that didn't work either. 我也试过“合并”而不是减少，但这也没有用。

The map reduce command run on mongos: (coll isn't sharded) map reduce命令在mongos上运行:( coll不分片）

db.coll.mapReduce(map, reduce, {out: {reduce: "events", "sharded": true}})

Which yields the following output: 产生以下输出：

{
    "result" : "events",
    "counts" : {
        "input" : NumberLong(2),
        "emit" : NumberLong(2),
        "reduce" : NumberLong(0),
        "output" : NumberLong(28304112)
    },
    "timeMillis" : 418,
    "timing" : {
        "shardProcessing" : 11,
        "postProcessing" : 407
    },
    "shardCounts" : {
        "stats2/192.168.…:27017,192.168.…" : {
            "input" : 2,
            "emit" : 2,
            "reduce" : 0,
            "output" : 2
        }
    },
    "postProcessCounts" : {
        "stats1/192.168.…:27017,…" : {
            "input" : NumberLong(0),
            "reduce" : NumberLong(0),
            "output" : NumberLong(14151042)
        },
        "stats2/192.168.…:27017,…" : {
            "input" : NumberLong(0),
            "reduce" : NumberLong(0),
            "output" : NumberLong(14153070)
        }
    },
    "ok" : 1,
}

So I see that the mapReduce is run over 2 records, which results in 2 records outputted. 所以我看到mapReduce运行了2条记录，导致输出2条记录。 However in the postProcessCounts for both shards the input count stays 0. Also trying to find the record with a search on _id yields no result. 但是在两个分片的postProcessCounts中，输入计数保持为0.同时尝试通过在_id上搜索来查找记录不会产生任何结果。 In the log file of MongoDB I wasn't able to find error messages related to this. 在MongoDB的日志文件中，我无法找到与此相关的错误消息。

After trying to reproduce this with a newly created output collection, that I also sharded on hashed _id and I also gave the same indexes, I wasn't able to reproduce this. 在尝试使用新创建的输出集合重现它之后，我还在散列_id上进行了分片，并且我也给出了相同的索引，我无法重现这一点。 When outputting the same input to a different collection 将相同输入输出到其他集合时

db.coll.mapReduce(map, reduce, {out: {reduce: "events_test2", "sharded": true}})

The result is stored in the output collection and I got the following output: 结果存储在输出集合中，我得到以下输出：

{
    "result" : "events_test2",
    "counts" : {
        "input" : NumberLong(2),
        "emit" : NumberLong(2),
        "reduce" : NumberLong(0),
        "output" : NumberLong(4)
    },
    "timeMillis" : 321,
    "timing" : {
        "shardProcessing" : 68,
        "postProcessing" : 253
    },
    "shardCounts" : {
        "stats2/192.168.…:27017,…" : {
            "input" : 2,
            "emit" : 2,
            "reduce" : 0,
            "output" : 2
        }
    },
    "postProcessCounts" : {
        "stats1/192.168.…:27017,…" : {
            "input" : NumberLong(2),
            "reduce" : NumberLong(0),
            "output" : NumberLong(2)
        },
        "stats2/192.168.…:27017,…" : {
            "input" : NumberLong(2),
            "reduce" : NumberLong(0),
            "output" : NumberLong(2)
        }
    },
    "ok" : 1,
}

When running the script again with the same input ouputting again in the second collection, it shows that it is reducing in postProcessCounts. 当在第二个集合中再次使用相同的输入输出再次运行脚本时，它会显示它在postProcessCounts中正在减少。 So the map and reduce functions do their job fine. 因此map和reduce函数可以很好地完成工作。 Why doesn't it work on the larger first collection? 为什么它不适用于较大的第一个系列？ Am I doing something wrong here? 我在这里做错了吗？ Are there any special limitations on collections that can be used as output for map-reduce? 对可以用作map-reduce输出的集合有任何特殊限制吗？

Answer 1

mapReduce is run over 2 records, which results in 2 records outputted. mapReduce运行2条记录，导致输出2条记录。 However in the postProcessCounts for both shards the input count stays 0. 但是在两个分片的postProcessCounts中，输入计数保持为0。

Map is run over 2 records. 地图运行2条记录。 If those two records have a different key then the Map will output 2 keys and a value for each. 如果这两个记录具有不同的键，则Map将输出2个键和每个键的值。 Which is normal. 这是正常的。

But something that I noticed in an older version of MongoDB (not sure if this applies in your case) is that if the "values array " for the reduce phase have a length, then reducing will be skipped. 但是我在较早版本的MongoDB中注意到的事情（不确定这是否适用于你的情况）是，如果reduce阶段的“values array”有一个长度，那么将会跳过reduce。

Is the output collection empty in the first case? 在第一种情况下输出集合是空的吗？

MongoDB中的MapReduce不输出

问题描述

1 个解决方案

解决方案1
0 2015-06-04 13:17:36

MongoDB中的MapReduce不输出

问题描述

1 个解决方案

解决方案1 0 2015-06-04 13:17:36

解决方案1
0 2015-06-04 13:17:36