![](/img/trans.png)
[英]How to sum values in a column dataframe base on values in another column
[英]Sum column values in a dataframe if values in another column are next to each other
您好,我有一个 dataframe:
import pandas as pd
df1 = {'name': ["x","x","x","x","x","x","x","y","y","y","y","y","y","y"],
'a': [3,4,5,11,14,15,16,2,3,4,10,13,14,15],
'b': [9,8,7,12,23,22,21,8,7,6,11,22,21,20],
'val': [2,1,3,4,5,6,3,21,11,31,41,51,61,31]
}
df1 = pd.DataFrame (df1, columns = ['name','a','b','val'])
如果“a”列中的数字彼此相邻,我希望对“val”列中的数字求和。 例如,在“a”中,您有 3,4,5(彼此相邻),因此将它们在“val”列中的相关数字相加(即 2+1+3),然后创建一个新列,其中添加值为当下。 对我来说更难的是按“名称”对它们进行分组。
我不知道我解释得有多好,但这是我希望最终得到的 dataframe
df2 = {'name': ["x","x","x","x","x","x","x","y","y","y","y","y","y","y"],
'a': [3,4,5,11,14,15,16,2,3,4,10,13,14,15],
'b': [9,8,7,12,23,22,21,8,7,6,11,22,21,20],
'val': [2,1,3,4,5,6,3,21,11,31,41,51,61,31],
'sum_val': [6,6,6,4,14,14,14,63,63,63,41,143,143,143]
}
df2 = pd.DataFrame (df2, columns = ['name','a','b','val','sum_val'])
通过比较不等于 lambda function 中的每组累积总和的差异来创建组,并将Series
传递给GroupBy.transform
并使用sum
:
g = df1.groupby('name')['a'].apply(lambda x: x.diff().ne(1).cumsum())
df1['sum_val'] = df1.groupby([g, 'name'])['val'].transform('sum')
print (df1)
name a b val sum_val
0 x 3 9 2 6
1 x 4 8 1 6
2 x 5 7 3 6
3 x 11 12 4 4
4 x 14 23 5 14
5 x 15 22 6 14
6 x 16 21 3 14
7 y 2 8 21 63
8 y 3 7 11 63
9 y 4 6 31 63
10 y 10 11 41 41
11 y 13 22 51 143
12 y 14 21 61 143
13 y 15 20 31 143
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.