[英]Mapreduce Python
I'm completely new to Python and MapReduce.我对 Python 和 MapReduce 完全陌生。 It would be great if someone can help me to achieve below results.
如果有人可以帮助我实现以下结果,那就太好了。 I want to calculate the count of key and the average of values per key from a list like below.
我想从下面的列表中计算键的计数和每个键的平均值。 The first number in the pair is the key and the second is the value.
对中的第一个数字是键,第二个是值。
The output will look as below. output 如下所示。
Thank you谢谢
I would recommend you to use itertools instead of reduce.我建议您使用 itertools 而不是 reduce。
import itertools
import functools
import statistics
data = [[1,5], [1,5], [2,7], [2,8], [1,10], [2,10], [3,3], [1,20]]
# First, sort and group the input by key
sorted_data = sorted(data, key=lambda x: x[0])
grouped = itertools.groupby(sorted_data, lambda e: e[0])
# This will result in a structure like this:
# [
# (1, [[1, 5], [1, 5], [1, 10], [1, 20]]),
# (2, [[2, 7], [2, 8], [2, 10]]),
# (3, [[3, 3]])
# ]
# Remove the duplicate keys from the structure
remove_duplicate_keys = map(lambda x: (x[0], [e[1] for e in x[1]]), grouped)
# This will produce the following structure:
# [
# (1, [5, 5, 10, 20]),
# (2, [7, 8, 10]),
# (3, [3])
# ]
# Now, calculate count and mean for each entry
result = map(lambda x: (x[0], len(x[1]), statistics.mean(x[1])), remove_dublicate_keys)
# This will result in the following list:
# [(1, 4, 10), (2, 3, 8.333333333333334), (3, 1, 3)]
Note: All instructions will return generators.注意:所有指令都将返回生成器。 This means python will not calculate anything until you start using it.
这意味着 python 在您开始使用之前不会计算任何东西。 But you can only access the elements once.
但是您只能访问元素一次。 If you need them to be in a regular list or need to access the information multiple times, replace the last line with this:
如果您需要它们在常规列表中或需要多次访问信息,请将最后一行替换为:
result = list(map(lambda x: (x[0], len(x[1]), statistics.mean(x[1])), remove_dublicate_keys))
This will convert the original generator chain into a regular list.这会将原始生成器链转换为常规列表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.