简体   繁体   English

使用Map Reduce旋转数据时遇到问题

[英]Trouble Pivoting data with Map Reduce

I am having trouble pivoting my dataset with map reduce. 我在使用map reduce旋转数据集时遇到麻烦。 I've been using the MongoDB cookbook for help, but I'm getting some weird errors. 我一直在使用MongoDB食谱寻求帮助,但是却遇到了一些奇怪的错误。 I want to take the below collection and pivot it so that each user has a list of all of the review ratings. 我想采用以下集合并对其进行透视,以便每个用户都有所有评论评分的列表。

My collection looks like this: 我的收藏看起来像这样:

{
  'type': 'review',
  'business_id': (encrypted business id),
  'user_id': (encrypted user id),
  'stars': (star rating),
  'text': (review text),
}

Map function (wrapped in Python): 地图函数(包装在Python中):

map = Code(""""
function(){
key = {user : this.user_id};
value = {ratings: [this.business_id, this.stars]};

emit(key, value);
}
""")

The map function should return an array of values associated with the key... Reduce function (wrapped in Python): map函数应返回与键关联的值的数组... Reduce函数(包装在Python中):

reduce = Code("""
function(key, values){
var result = { value: [] };
temp = [];

for (var i = 0; i < values.length; i++){
temp.push(values[i].ratings);
}
result.value = temp;
return result;
}
""")

However, the results return one less rating than total. 然而,返回的结果不是总少了一个等级。 In fact, some users have None returned, which can't happen. 实际上,某些用户没有返回任何值,这是不可能发生的。 Some entries look like the following: 一些条目如下所示:

{u'_id': {u'user: u'zwZytzNIayFoQVEG8Xcvxw'}, u'value': [None, [u'e9nN4XxjdHj4qtKCOPQ_vg', 3.0], None, [...]...]

I can't pinpoint what in my code is causing this. 我无法查明是什么原因导致了这个问题。 If there are 3 reviews, they all have business IDs and ratings in the document. 如果有3条评论,则它们在文档中都具有业务ID和等级。 Plus, using 'values.length + 1' in my loop condition breaks values[i] for some reason. 另外,由于某种原因,在我的循环条件中使用'values.length + 1'会破坏values [i]。

Edit 1 编辑1

I've embraced the fact that reduce gets called multiple times on itself, so below is my new reducer. 我已经接受了reduce本身被多次调用的事实,因此下面是我的新reducer。 This returns an array of [business, rating, business, rating]. 这将返回[业务,等级,业务,等级]数组。 Any idea how to output [business, rating] arrays instead of one giant array? 知道如何输出[业务,评级]数组而不是一个巨型数组吗?

function(key, value){
var result = { ratings:[] };
var temp = [];
values.forEach(function(value){
    value.ratings.forEach(function(rating){
        if(temp.indexof(rating) == -1){
            temp.push(rating);
        }
    });
});

result. rartings = temp;
return result;
}

Heres a test example: 这是一个测试示例:

1) Add some sample data: 1)添加一些示例数据:

db.test.drop();
db.test.insert(
  [{
    'type': 'review',
    'business_id': 1,
    'user_id': 1,
    'stars': 1,
  },
  {
    'type': 'review',
    'business_id': 2,
    'user_id': 1,
    'stars': 2,
  },
  {
    'type': 'review',
    'business_id': 2,
    'user_id': 2,
    'stars': 3,
  }]
);

2) Map function 2)地图功能

var map = function() {
  emit(this.user_id, [[this.business_id, this.stars]]);
};

Here we set the results as we want them to look like at the end of the process. 在这里,我们设置结果,就像我们希望它们在过程结束时一样。 Why? 为什么? because if there is only ever a single review by a user (the key we are grouping by) then the results won't go through a reduce phase. 因为如果用户只进行过一次审核(我们所依据的键),那么结果将不会经过简化阶段。

3) Reduce function 3)缩小功能

var reduce = function(key, values) {
  var result = { ratings: [] };
  values.forEach(function(value){
    result.ratings.push(value[0]);
  });

  return result;
};

Here we collect up all the values, remembering we nested them in the map method, so we can just pick out the first value for each set of results. 在这里,我们收集了所有值,并记住我们将它们嵌套在map方法中,因此我们只需为每组结果选择第一个值即可。

4) Run the map reduce: 4)运行地图reduce:

db.test.mapReduce(map, reduce, {finalize: final, out: { inline: 1 }});

Alternative - use the aggregation framework : 替代方法-使用聚合框架

db.test.aggregate({
  $group: {
    _id: "$user_id", 
    ratings: {$addToSet: {business_id: "$business_id", stars: "$stars"}}
  }
});

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM