简体   繁体   English

Python:如果两列具有相同的值,则为第三列的和值

[英]Python: sum values of the third column if two columns have the same value

I have the following dataframe df 我有以下数据帧df

df
    a   b   i
0   1.0 3.0 2.0
1   1.0 3.0 3.0
2   1.0 3.0 1.0
3   1.0 3.0 3.0
4   1.0 3.0 7.0
5   1.0 3.0 8.0
6   1.0 4.0 4.0
7   1.0 4.0 0.0
8   1.0 3.0 2.0
9   1.0 3.0 1.0
10  1.0 3.0 2.0

I want to make the sum over i for the same couple a and b , so 我想为同一对ab做出i的总和,所以

df2
    a   b   i
0   1.0 3.0 31.0
1   1.0 4.0 4.0
2   1.0 3.0 0.0

df2 = df2.groupby(['a', 'b']).sum(['i']).reset_index()

I think you need add column i to the end of groupby , then it is use for sum function: 我认为你需要在groupby的末尾添加第i列,然后它用于sum函数:

df2 = df2.groupby(['a', 'b'])['i'].sum().reset_index()
print (df2)
     a    b     i
0  1.0  3.0  29.0
1  1.0  4.0   4.0

Or add parameter as_index=False for return df : 或者为return df添加参数as_index=False

df2 = df2.groupby(['a', 'b'], as_index=False)['i'].sum()
print (df2)
     a    b     i
0  1.0  3.0  29.0
1  1.0  4.0   4.0

Another solution if necessary is use Series : 必要时另一种解决方案是使用Series

df2 = df2.i.groupby([df2.a,df2.b]).sum().reset_index()
print (df2)
     a    b     i
0  1.0  3.0  29.0
1  1.0  4.0   4.0

EDIT: 编辑:

If need difference of groups by position in df use groupby by Series g with aggregate : 如果需要在df按位置的组间差异使用groupby by Series g with aggregate

ab = df2[['a','b']]

#compare shifted values    
print (ab.ne(ab.shift()))
        a      b
0    True   True
1   False  False
2   False  False
3   False  False
4   False  False
5   False  False
6   False   True
7   False  False
8   False   True
9   False  False
10  False  False
#check at least one True
print (ab.ne(ab.shift()).any(1))
0      True
1     False
2     False
3     False
4     False
5     False
6      True
7     False
8      True
9     False
10    False
dtype: bool
#use cumulative sum of boolean Series
g = ab.ne(ab.shift()).any(1).cumsum()
print (g)
0     1
1     1
2     1
3     1
4     1
5     1
6     2
7     2
8     3
9     3
10    3
dtype: int32
print (df2.groupby(g).agg(dict(a='first', b='first', i='sum')))
     a    b     i
1  1.0  3.0  24.0
2  1.0  4.0   4.0
3  1.0  3.0   5.0

you want to compare to see if the prior a, b combination has changed and do a cumsum to establish a grouping array 你想要比较,以查看先前的a, b组合是否已经改变,并做一个cumsum来建立一个分组数组

ab = df[['a', 'b']].apply(tuple, 1)

df.groupby(ab.ne(ab.shift()).cumsum()) \
  .agg(dict(a='last', b='last', i='sum')) \
  .reindex_axis(df.columns.tolist(), 1)

在此输入图像描述


break it down 打破它

  • ab = df[['a', 'b']].apply(tuple, 1)
    • get me a series of tuples so I can see if the combination changed 给我一系列元组,这样我就能看出组合是否发生了变化
  • ab.ne(ab.shift())
    • check if tuple is not the same as previous tuple 检查元组是否与前一元组不同
  • ab.ne(ab.shift()).cumsum()
    • if it isn't, then add the True value to the cumumlative sum. 如果不是,则将True值添加到累积总和。 This will create a handy grouping for each contigous set of identical pairs of a and b 这将为ab每个相同的a对相同对创建一个方便的分组
  • .agg(dict(a='last', b='last', i='sum'))
    • just specifying what to do with each column in each group. 只需指定如何处理每个组中的每个列。 Get the last value for a and b , which is fine since I know its the same throughout the group. 获取ab的最后一个值,这很好,因为我知道它在整个组中都是一样的。 Sum over column i i总和
  • .reindex_axis(df.columns.tolist(), 1)
    • get my column order the way it was 按照原样获取我的列顺序

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM