简体   繁体   English

按第二个元素分组元组列表,取第一个元素的平均值

[英]Group list-of-tuples by second element, take average of first element

I have a list of tuples (x,y) like: 我有一个元组列表(x,y),如:

l = [(2,1), (4,6), (3,1), (2,7), (7,10)]

Now I want to make a new list: 现在我要创建一个新列表:

l = [(2.5,1), (4,6), (2,7), (7,10)]

with the new list having the average of the first value (x) of tuples if there are more than one tuple with the same second value (y) in the tuple. 如果在元组中存在多个具有相同第二值(y)的元组 ,则新列表具有元组的第一个值(x)的平均值。

Here since for (x,y) = (2,1) and (3,1) the second element in the tuple y=1 is common therefore the average of x=2 and 3 is in the new list. 这里因为对于(x,y)=(2,1)和(3,1),元组y = 1中的第二个元素是共同的,因此x = 2和3的平均值在新列表中。 y=1 does not occur anywhere else, therefore the other tuples remain unchanged. y = 1不会出现在其他任何地方,因此其他元组保持不变。

Since you tagged pandas : 自从您标记了pandas

l = [(2,1), (4,6), (3,1), (2,7), (7,10)]
df = pd.DataFrame(l)

Then df is a data frame with two columns: 然后df是一个包含两列的数据框:

    0   1
0   2   1
1   4   6
2   3   1
3   2   7
4   7   10

Now you want to compute the average of the numbers in column 0 with the same value in column 1 : 现在你要计算的数字列的平均0与列相同的值1

(df.groupby(1).mean()     # compute mean on each group
   .reset_index()[[0,1]]  # restore the column order
   .values                # return the underlying numpy array
 )

Output: 输出:

array([[ 2.5,  1. ],
       [ 4. ,  6. ],
       [ 2. ,  7. ],
       [ 7. , 10. ]])

First form a hashtable/dict of all the second elements as key and their corresponding value as a list of values. 首先将所有第二个元素的哈希表/字典形成为键,将它们的对应值作为值列表。 Then with a listcomp you can get the desired output by iterating over the items of the dict. 然后使用listcomp,您可以通过迭代dict的项目来获得所需的输出。

from collections import defaultdict
out = defaultdict(list)
for i in l:
    out[i[1]] += [i[0]]
out = [(sum(v)/len(v), k) for k, v in out.items()]
print(out)
#prints [(2.5, 1), (4.0, 6), (2.0, 7), (7.0, 10)]

Another way using groupby : 使用groupby另一种方法:

from itertools import groupby

# Sort list by the second element
sorted_list = sorted(l,key=lambda x:x[1])

# Group by second element
grouped_list = groupby(sorted_list, key=lambda x:x[1])

result = []
for _,group in grouped_list:
    x,y = list(zip(*group))
    # Take the mean of the first elements
    result.append((sum(x) / len(x),y[0]))

You get: 你得到:

[(2.5, 1), (4.0, 6), (2.0, 7), (7.0, 10)]

Here is a method using numpy.bincount . 这是一个使用numpy.bincount的方法。 It relies on the labels being nonnegative integers. 它依赖于非负整数的标签。 (If this is not the case one can do np.unique(i, return_inverse=True) first). (如果不是这种情况,可以先执行np.unique(i, return_inverse=True) )。

w,i = zip(*l)
n,d = np.bincount(i,w), np.bincount(i)
v, = np.where(d)
[*zip(n[v]/d[v],v)]
# [(2.5, 1), (4.0, 6), (2.0, 7), (7.0, 10)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM