[英]Average the duplicated values from two paired lists in Python
in my code I obtain two different lists from different sources, but I know they are in the same order. 在我的代码中,我从不同的来源获得两个不同的列表,但我知道它们的顺序相同。 The first list ("names") contains a list of keys strings, while the second ("result_values") is a series of floats.
第一个列表(“names”)包含键字符串列表,而第二个列表(“result_values”)是一系列浮点数。 I need to make the pair unique, but I can't use a dictionary as only the last value inserted would be kept: instead, I need to make an average (arithmetic mean) of the values that have a duplicate key.
我需要使该对唯一,但我不能使用字典,因为只保留插入的最后一个值:相反,我需要对具有重复键的值进行平均(算术平均)。
Example of the wanted results: 想要结果的示例:
names = ["pears", "apples", "pears", "bananas", "pears"]
result_values = [2, 1, 4, 8, 6] # ints here but it's the same conceptually
combined_result = average_duplicates(names, result_values)
print combined_result
{"pears": 4, "apples": 1, "bananas": 8}
My only ideas involve multiple iterations and so far have been ugly... is there an elegant solution to this problem? 我唯一的想法涉及多次迭代,到目前为止一直很难看......这个问题有一个优雅的解决方案吗?
from collections import defaultdict
def averages(names, values):
# Group the items by name.
value_lists = defaultdict(list)
for name, value in zip(names, values):
value_lists[name].append(value)
# Take the average of each list.
result = {}
for name, values in value_lists.iteritems():
result[name] = sum(values) / float(len(values))
return result
names = ["pears", "apples", "pears", "bananas", "pears"]
result_values = [2, 1, 4, 8, 6]
print averages(names, result_values)
I would use a dictionary anyways 反正我会用字典
averages = {}
counts = {}
for name, value in zip(names, result_values):
if name in averages:
averages[name] += value
counts[name] += 1
else:
averages[name] = value
counts[name] = 1
for name in averages:
averages[name] = averages[name]/float(counts[name])
If you're concerned with large lists, then I would replace zip
with izip
from itertools. 如果您关注大型列表,那么我将使用itertools中的
izip
替换zip
。
You could calculate the mean using a Cumulative moving average to only iterate through the lists once: 您可以使用累积移动平均值计算平均值 ,只迭代列表一次:
from collections import defaultdict
averages = defaultdict(float)
count = defaultdict(int)
for name,result in zip(names,result_values):
count[name] += 1
averages[name] += (result - averages[name]) / count[name]
I think what you're looking for is itertools.groupby
: 我认为你要找的是
itertools.groupby
:
import itertools
def average_duplicates(names, values):
pairs = sorted(zip(names, values))
result = {}
for key, group in itertools.groupby(pairs, key=lambda p: p[0]):
group_values = [value for (_, value) in group]
result[key] = sum(group_values) / len(group_values)
return result
>>> def avg_list(keys, values):
... def avg(series):
... return sum(series) / len(series)
... from collections import defaultdict
... d = defaultdict(list)
... for k, v in zip(keys, values):
... d[k].append(v)
... return dict((k, avg(v)) for k, v in d.iteritems())
...
>>> if __name__ == '__main__':
... names = ["pears", "apples", "pears", "bananas", "pears"]
... result_values = [2, 1, 4, 8, 6]
... print avg_list(names, result_values)
...
{'apples': 1, 'pears': 4, 'bananas': 8}
You can have avg()
return float(len(series))
if you want a floating point average. 如果你想要一个浮点平均值,你可以让
avg()
返回float(len(series))
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.