[英]Python: sum values of the third column if two columns have the same value
I have the following dataframe df
我有以下数据帧
df
df
a b i
0 1.0 3.0 2.0
1 1.0 3.0 3.0
2 1.0 3.0 1.0
3 1.0 3.0 3.0
4 1.0 3.0 7.0
5 1.0 3.0 8.0
6 1.0 4.0 4.0
7 1.0 4.0 0.0
8 1.0 3.0 2.0
9 1.0 3.0 1.0
10 1.0 3.0 2.0
I want to make the sum over i
for the same couple a
and b
, so 我想为同一对
a
和b
做出i
的总和,所以
df2
a b i
0 1.0 3.0 31.0
1 1.0 4.0 4.0
2 1.0 3.0 0.0
df2 = df2.groupby(['a', 'b']).sum(['i']).reset_index()
I think you need add column i
to the end of groupby
, then it is use for sum
function: 我认为你需要在
groupby
的末尾添加第i
列,然后它用于sum
函数:
df2 = df2.groupby(['a', 'b'])['i'].sum().reset_index()
print (df2)
a b i
0 1.0 3.0 29.0
1 1.0 4.0 4.0
Or add parameter as_index=False
for return df
: 或者为return
df
添加参数as_index=False
:
df2 = df2.groupby(['a', 'b'], as_index=False)['i'].sum()
print (df2)
a b i
0 1.0 3.0 29.0
1 1.0 4.0 4.0
Another solution if necessary is use Series
: 必要时另一种解决方案是使用
Series
:
df2 = df2.i.groupby([df2.a,df2.b]).sum().reset_index()
print (df2)
a b i
0 1.0 3.0 29.0
1 1.0 4.0 4.0
EDIT: 编辑:
If need difference of groups by position in df
use groupby
by Series
g
with aggregate
: 如果需要在
df
按位置的组间差异使用groupby
by Series
g
with aggregate
:
ab = df2[['a','b']]
#compare shifted values
print (ab.ne(ab.shift()))
a b
0 True True
1 False False
2 False False
3 False False
4 False False
5 False False
6 False True
7 False False
8 False True
9 False False
10 False False
#check at least one True
print (ab.ne(ab.shift()).any(1))
0 True
1 False
2 False
3 False
4 False
5 False
6 True
7 False
8 True
9 False
10 False
dtype: bool
#use cumulative sum of boolean Series
g = ab.ne(ab.shift()).any(1).cumsum()
print (g)
0 1
1 1
2 1
3 1
4 1
5 1
6 2
7 2
8 3
9 3
10 3
dtype: int32
print (df2.groupby(g).agg(dict(a='first', b='first', i='sum')))
a b i
1 1.0 3.0 24.0
2 1.0 4.0 4.0
3 1.0 3.0 5.0
you want to compare to see if the prior a, b
combination has changed and do a cumsum
to establish a grouping array 你想要比较,以查看先前的
a, b
组合是否已经改变,并做一个cumsum
来建立一个分组数组
ab = df[['a', 'b']].apply(tuple, 1)
df.groupby(ab.ne(ab.shift()).cumsum()) \
.agg(dict(a='last', b='last', i='sum')) \
.reindex_axis(df.columns.tolist(), 1)
break it down 打破它
ab = df[['a', 'b']].apply(tuple, 1)
ab.ne(ab.shift())
ab.ne(ab.shift()).cumsum()
True
value to the cumumlative sum. True
值添加到累积总和。 This will create a handy grouping for each contigous set of identical pairs of a
and b
a
和b
每个相同的a
对相同对创建一个方便的分组 .agg(dict(a='last', b='last', i='sum'))
a
and b
, which is fine since I know its the same throughout the group. a
和b
的最后一个值,这很好,因为我知道它在整个组中都是一样的。 Sum over column i
i
总和 .reindex_axis(df.columns.tolist(), 1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.