简体   繁体   中英

MapReduce in MongoDB doesn't output

I was trying to use MongoDB 2.4.3 (also tried 2.4.4) with mapReduce on a cluster with 2 shards with each 3 replicas. I have a problem with results of the mapReduce job not being reduced into output collection. I tried an Incremental Map Reduce . I also tried "merging" instead of reducing, but that didn't work either.

The map reduce command run on mongos: (coll isn't sharded)

db.coll.mapReduce(map, reduce, {out: {reduce: "events", "sharded": true}})

Which yields the following output:

{
    "result" : "events",
    "counts" : {
        "input" : NumberLong(2),
        "emit" : NumberLong(2),
        "reduce" : NumberLong(0),
        "output" : NumberLong(28304112)
    },
    "timeMillis" : 418,
    "timing" : {
        "shardProcessing" : 11,
        "postProcessing" : 407
    },
    "shardCounts" : {
        "stats2/192.168.…:27017,192.168.…" : {
            "input" : 2,
            "emit" : 2,
            "reduce" : 0,
            "output" : 2
        }
    },
    "postProcessCounts" : {
        "stats1/192.168.…:27017,…" : {
            "input" : NumberLong(0),
            "reduce" : NumberLong(0),
            "output" : NumberLong(14151042)
        },
        "stats2/192.168.…:27017,…" : {
            "input" : NumberLong(0),
            "reduce" : NumberLong(0),
            "output" : NumberLong(14153070)
        }
    },
    "ok" : 1,
}

So I see that the mapReduce is run over 2 records, which results in 2 records outputted. However in the postProcessCounts for both shards the input count stays 0. Also trying to find the record with a search on _id yields no result. In the log file of MongoDB I wasn't able to find error messages related to this.

After trying to reproduce this with a newly created output collection, that I also sharded on hashed _id and I also gave the same indexes, I wasn't able to reproduce this. When outputting the same input to a different collection

db.coll.mapReduce(map, reduce, {out: {reduce: "events_test2", "sharded": true}})

The result is stored in the output collection and I got the following output:

{
    "result" : "events_test2",
    "counts" : {
        "input" : NumberLong(2),
        "emit" : NumberLong(2),
        "reduce" : NumberLong(0),
        "output" : NumberLong(4)
    },
    "timeMillis" : 321,
    "timing" : {
        "shardProcessing" : 68,
        "postProcessing" : 253
    },
    "shardCounts" : {
        "stats2/192.168.…:27017,…" : {
            "input" : 2,
            "emit" : 2,
            "reduce" : 0,
            "output" : 2
        }
    },
    "postProcessCounts" : {
        "stats1/192.168.…:27017,…" : {
            "input" : NumberLong(2),
            "reduce" : NumberLong(0),
            "output" : NumberLong(2)
        },
        "stats2/192.168.…:27017,…" : {
            "input" : NumberLong(2),
            "reduce" : NumberLong(0),
            "output" : NumberLong(2)
        }
    },
    "ok" : 1,
}

When running the script again with the same input ouputting again in the second collection, it shows that it is reducing in postProcessCounts. So the map and reduce functions do their job fine. Why doesn't it work on the larger first collection? Am I doing something wrong here? Are there any special limitations on collections that can be used as output for map-reduce?

mapReduce is run over 2 records, which results in 2 records outputted. However in the postProcessCounts for both shards the input count stays 0.

Map is run over 2 records. If those two records have a different key then the Map will output 2 keys and a value for each. Which is normal.

But something that I noticed in an older version of MongoDB (not sure if this applies in your case) is that if the "values array " for the reduce phase have a length, then reducing will be skipped.

Is the output collection empty in the first case?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM