简体   繁体   中英

Trouble Pivoting data with Map Reduce

I am having trouble pivoting my dataset with map reduce. I've been using the MongoDB cookbook for help, but I'm getting some weird errors. I want to take the below collection and pivot it so that each user has a list of all of the review ratings.

My collection looks like this:

{
  'type': 'review',
  'business_id': (encrypted business id),
  'user_id': (encrypted user id),
  'stars': (star rating),
  'text': (review text),
}

Map function (wrapped in Python):

map = Code(""""
function(){
key = {user : this.user_id};
value = {ratings: [this.business_id, this.stars]};

emit(key, value);
}
""")

The map function should return an array of values associated with the key... Reduce function (wrapped in Python):

reduce = Code("""
function(key, values){
var result = { value: [] };
temp = [];

for (var i = 0; i < values.length; i++){
temp.push(values[i].ratings);
}
result.value = temp;
return result;
}
""")

However, the results return one less rating than total. In fact, some users have None returned, which can't happen. Some entries look like the following:

{u'_id': {u'user: u'zwZytzNIayFoQVEG8Xcvxw'}, u'value': [None, [u'e9nN4XxjdHj4qtKCOPQ_vg', 3.0], None, [...]...]

I can't pinpoint what in my code is causing this. If there are 3 reviews, they all have business IDs and ratings in the document. Plus, using 'values.length + 1' in my loop condition breaks values[i] for some reason.

Edit 1

I've embraced the fact that reduce gets called multiple times on itself, so below is my new reducer. This returns an array of [business, rating, business, rating]. Any idea how to output [business, rating] arrays instead of one giant array?

function(key, value){
var result = { ratings:[] };
var temp = [];
values.forEach(function(value){
    value.ratings.forEach(function(rating){
        if(temp.indexof(rating) == -1){
            temp.push(rating);
        }
    });
});

result. rartings = temp;
return result;
}

Heres a test example:

1) Add some sample data:

db.test.drop();
db.test.insert(
  [{
    'type': 'review',
    'business_id': 1,
    'user_id': 1,
    'stars': 1,
  },
  {
    'type': 'review',
    'business_id': 2,
    'user_id': 1,
    'stars': 2,
  },
  {
    'type': 'review',
    'business_id': 2,
    'user_id': 2,
    'stars': 3,
  }]
);

2) Map function

var map = function() {
  emit(this.user_id, [[this.business_id, this.stars]]);
};

Here we set the results as we want them to look like at the end of the process. Why? because if there is only ever a single review by a user (the key we are grouping by) then the results won't go through a reduce phase.

3) Reduce function

var reduce = function(key, values) {
  var result = { ratings: [] };
  values.forEach(function(value){
    result.ratings.push(value[0]);
  });

  return result;
};

Here we collect up all the values, remembering we nested them in the map method, so we can just pick out the first value for each set of results.

4) Run the map reduce:

db.test.mapReduce(map, reduce, {finalize: final, out: { inline: 1 }});

Alternative - use the aggregation framework :

db.test.aggregate({
  $group: {
    _id: "$user_id", 
    ratings: {$addToSet: {business_id: "$business_id", stars: "$stars"}}
  }
});

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM