[英]Trouble Pivoting data with Map Reduce
I am having trouble pivoting my dataset with map reduce. 我在使用map reduce旋转数据集时遇到麻烦。 I've been using the MongoDB cookbook for help, but I'm getting some weird errors. 我一直在使用MongoDB食谱寻求帮助,但是却遇到了一些奇怪的错误。 I want to take the below collection and pivot it so that each user has a list of all of the review ratings. 我想采用以下集合并对其进行透视,以便每个用户都有所有评论评分的列表。
My collection looks like this: 我的收藏看起来像这样:
{
'type': 'review',
'business_id': (encrypted business id),
'user_id': (encrypted user id),
'stars': (star rating),
'text': (review text),
}
Map function (wrapped in Python): 地图函数(包装在Python中):
map = Code(""""
function(){
key = {user : this.user_id};
value = {ratings: [this.business_id, this.stars]};
emit(key, value);
}
""")
The map function should return an array of values associated with the key... Reduce function (wrapped in Python): map函数应返回与键关联的值的数组... Reduce函数(包装在Python中):
reduce = Code("""
function(key, values){
var result = { value: [] };
temp = [];
for (var i = 0; i < values.length; i++){
temp.push(values[i].ratings);
}
result.value = temp;
return result;
}
""")
However, the results return one less rating than total. 然而,返回的结果不是总少了一个等级。 In fact, some users have None returned, which can't happen. 实际上,某些用户没有返回任何值,这是不可能发生的。 Some entries look like the following: 一些条目如下所示:
{u'_id': {u'user: u'zwZytzNIayFoQVEG8Xcvxw'}, u'value': [None, [u'e9nN4XxjdHj4qtKCOPQ_vg', 3.0], None, [...]...]
I can't pinpoint what in my code is causing this. 我无法查明是什么原因导致了这个问题。 If there are 3 reviews, they all have business IDs and ratings in the document. 如果有3条评论,则它们在文档中都具有业务ID和等级。 Plus, using 'values.length + 1' in my loop condition breaks values[i] for some reason. 另外,由于某种原因,在我的循环条件中使用'values.length + 1'会破坏values [i]。
Edit 1 编辑1
I've embraced the fact that reduce gets called multiple times on itself, so below is my new reducer. 我已经接受了reduce本身被多次调用的事实,因此下面是我的新reducer。 This returns an array of [business, rating, business, rating]. 这将返回[业务,等级,业务,等级]数组。 Any idea how to output [business, rating] arrays instead of one giant array? 知道如何输出[业务,评级]数组而不是一个巨型数组吗?
function(key, value){
var result = { ratings:[] };
var temp = [];
values.forEach(function(value){
value.ratings.forEach(function(rating){
if(temp.indexof(rating) == -1){
temp.push(rating);
}
});
});
result. rartings = temp;
return result;
}
Heres a test example: 这是一个测试示例:
1) Add some sample data: 1)添加一些示例数据:
db.test.drop();
db.test.insert(
[{
'type': 'review',
'business_id': 1,
'user_id': 1,
'stars': 1,
},
{
'type': 'review',
'business_id': 2,
'user_id': 1,
'stars': 2,
},
{
'type': 'review',
'business_id': 2,
'user_id': 2,
'stars': 3,
}]
);
2) Map function 2)地图功能
var map = function() {
emit(this.user_id, [[this.business_id, this.stars]]);
};
Here we set the results as we want them to look like at the end of the process. 在这里,我们设置结果,就像我们希望它们在过程结束时一样。 Why? 为什么? because if there is only ever a single review by a user (the key we are grouping by) then the results won't go through a reduce phase. 因为如果用户只进行过一次审核(我们所依据的键),那么结果将不会经过简化阶段。
3) Reduce function 3)缩小功能
var reduce = function(key, values) {
var result = { ratings: [] };
values.forEach(function(value){
result.ratings.push(value[0]);
});
return result;
};
Here we collect up all the values, remembering we nested them in the map method, so we can just pick out the first value for each set of results. 在这里,我们收集了所有值,并记住我们将它们嵌套在map方法中,因此我们只需为每组结果选择第一个值即可。
4) Run the map reduce: 4)运行地图reduce:
db.test.mapReduce(map, reduce, {finalize: final, out: { inline: 1 }});
db.test.aggregate({
$group: {
_id: "$user_id",
ratings: {$addToSet: {business_id: "$business_id", stars: "$stars"}}
}
});
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.