简体   繁体   English

Mapreduce无法按值排序[python]

[英]Mapreduce cannot sort by value [python]

Purpose is to sort (key, value) by value of the input, which is a json file. 目的是按输入值(一个json文件)对(键,值)进行排序。 I have 4 methods, two pairs of mappers and reducers. 我有4种方法,两对映射器和简化器。

Input is similar to 输入类似于

{ 
  id: 1, 
  user: {
    friends_count: 1
  } 
}

Output of first stage of mapper and reducer is something like 映射器和化简器的第一阶段的输出类似于

A 1
B 2
C 3
D 4

What i want is 我想要的是

1 A
2 B
3 C
4 D

In the first stage sort by key works fine, but at the second stage where i try to make value the key, an error is thrown which says 在第一阶段,按键排序可以正常工作,但是在第二阶段,我尝试为键赋值时,抛出错误,提示

TypeError: at 0x7fa43ea615a0> is not JSON serializable TypeError:0x7fa43ea615a0>不能序列化JSON

The code which i am using is 我正在使用的代码是

from mrjob.job import MRJob
from mrjob.step import MRStep
import json

class MRFrnsCounter(MRJob):
    def steps(self):
        return [
            MRStep(mapper=self.mapper,
                   reducer=self.reducer),
            MRStep(mapper = self.mapper_two,
                    reducer = self.reducer_two)
        ]

def mapper(self, _, line):
    f = json.loads(line)
    (uid, frns) = f["id"],f["user"]["friends_count"]
    yield (uid), (frns)

def reducer(self, uid, frns):
    yield uid, sum(frns)

def mapper_two(self, uid, frns):
    yield (frns), (uid)

def reducer_two(self, frns, uid):
    yield (frns), uid

if __name__ == '__main__':
    MRFrnsCounter.run()

The code breaks in the second mapper when the key and value are reversed. 当键和值反转时,代码在第二个映射器中中断。 Any opinions would be appreciated. 任何意见将不胜感激。

Why not just yield sum(frns), uid in the first reducer? 为什么不只在第一个减速器中yield sum(frns), uid

However, in your second mapper you are trying to yield a generator, not an integer. 但是,在第二个映射器中,您试图产生一个生成器,而不是一个整数。 Iterate through the generator to yield frns, uid. 遍历生成器以产生frns,uid。 Something like this: 像这样:

for num in frns:
    yield num, uid

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM