[英]MongoDb MapReduce Group by Key NOT Value
i am trying to write a mapreduce function to accumulate statistics from a mongodb. 我正在尝试编写一个mapreduce函数以从mongodb累积统计信息。 However.. My teammate who created the data structure saved the data as followed: 但是,创建数据结构的队友将数据保存如下:
"statistics": {
"20111206": {
"CN": {
"Beijing": {
"cart": 1,
"cart_users": [
{ "$oid" : "4EDD73938EAD0E5420000000" }
],
"downloads": {
"wmv": {
"mid": 1
}
},
"orders": {
"wmv": {
"mid": 1
}
}
}
}
}
}
The Problem is that a lot of values i need to group by are just stored in the keys (like CN or BEJING in the example) . 问题在于,我需要分组的许多值仅存储在键中(例如示例中的CN或BEJING)。 These can be country codes, video formats etc... so i dont want to harcode any of these in the mapreduce function. 这些可以是国家代码,视频格式等...所以我不想在mapreduce函数中对其中的任何一个进行编码。
The forEach function which i used for the reduce part only passes in the values as an argument.. 我用于化简部分的forEach函数仅将值作为参数传递。
So the question is: is there any way to perform a mapReduce on this and group by keys or must i first convert the data into a new structure which looks more or less somthing like this: 所以问题是:有没有办法在此上执行mapReduce并按键分组,或者我必须首先将数据转换成看起来或多或少像这样的新结构:
{
"movie_id": "4edcd4f29a4e61c00c000059",
"country": "CN",
"city": "Beijing",
"list": [
{
"user_id": { "$oid" : "4EDD75388EAD0E5720010000" },
"downloads": {
"cnt": 1,
"list": [
{
"format": "wmv",
"quality": "high"
}
]
},
"orders": {
"cnt": 1,
"list": [
{
"format": "wmv",
"quality": "high"
}
]
}
}
]
}
Say your collection is set up with records like the following: 假设您的收藏夹设置了如下记录:
> db.test_col.findOne()
{
"_id" : ObjectId("4f90ed994d2246dd7996e042"),
"statistics" : {
"20111206" : {
"CN" : {
"Beijing" : {
"cart" : 1,
"cart_users" : [
{
"oid" : "4EDD73938EAD0E5420000000"
}
],
"downloads" : {
"wmv" : {
"mid" : 1
}
},
"orders" : {
"wmv" : {
"mid" : 1
}
}
}
}
}
}
}
Here's a command that will group by country, provide a list of cities, and total count for the country. 这是一个将按国家/地区分组,提供城市列表以及该国家/地区总数的命令。 It should get you closer to what you
are
were trying to do: 它应该让你更接近你
正在
试图做的事:
db.runCommand({ mapreduce: "test_col",
map: function () {
var l0 = this.statistics,
date = Object.keySet(l0)[0],
l1 = l0[date],
country = Object.keySet(l1)[0],
l2 = l1[country],
city = Object.keySet(l2)[0],
data = l2[city];
emit(country, { date: date, city: city, data: data });
},
reduce: function (country, values) {
var r = { cities: [], count: 0 };
values.forEach(function (v) {
if (r.cities.indexOf(v.city) == -1) r.cities.push(v.city);
r.count++;
});
return r;
},
out: { reduce: "test_col_reduce" }
});
The output for my test data looks like this: 我的测试数据的输出如下所示:
> db.test_col_reduce.find()
{ "_id" : "AR", "value" : { "cities" : [ "San Juan", "Buenos Aires", "Cordoba", "Rosario" ], "count" : 18 } }
{ "_id" : "BZ", "value" : { "cities" : [ "Morico", "San Ignacio", "Corozal" ], "count" : 15 } }
{ "_id" : "CN", "value" : { "cities" : [ "Beijing", "Shanghai", "HongKong" ], "count" : 26 } }
{ "_id" : "US", "value" : { "cities" : [ "San Diego", "Los Angeles", "San Francisco", "New York" ], "count" : 27 } }
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.