简体   繁体   English

对熊猫数据框中的一列求和,其中一列满足条件,但按另一列分组

[英]Sum a column in a pandas dataframe where a condition is met in one column, but grouped by another

I have a dataframe like this:我有一个这样的数据框:

        Ref_No  Definition  Total_to_Add
    0   ref1        B            20        
    1   ref2        A            30        
    2   ref1        B            40        
    3   ref2        A            50        
    4   ref1        B            60        
    5   ref2        B            50         
    6   ref1        B            60        
    7   ref2        B            50        
    8   ref1        B            60        

For each reference, I want to sum Total_to_Add if they are 'B' and the same reference number (i'll have another column for A).对于每个参考,如果它们是“B”且参考号相同(我将有另一列用于 A),我想对 Total_to_Add 求和。 There are 100's of reference numbers.有 100 个参考编号。

I can sum those that meet a single condition like this:我可以总结满足这样一个条件的那些:

df['ANSWER'] = df[df['Definition']=='A']['Total_to_Add'].sum()

Or i can group by a reference like this:或者我可以按这样的参考分组:

df['ANSWER']=(df.groupby('Ref_No')['Total_to_Add'].transform('sum'))

But i can't seem to combine these functions, ie create a new column that totals up if the definition is 'B' and total by Ref_No.但我似乎无法组合这些功能,即创建一个新列,如果定义为“B”,则总计为 Ref_No。

I'm aiming for output like the below:我的目标是像下面这样的输出:

        Ref_No  Definition  Total_to_Add  Total_'B'
    0   ref1        B            20        240
    1   ref2        A            30        100
    2   ref1        B            40        240
    3   ref2        A            50        100
    4   ref1        B            60        240
    5   ref2        B            50        100 
    6   ref1        B            60        240
    7   ref2        B            50        100
    8   ref1        B            60        240

Any wisdom appreciated!任何智慧赞赏! Thanks谢谢

Replace non B values to 0 by Series.where and then use GroupBy.transform :通过Series.where将非B值替换为0 ,然后使用GroupBy.transform

df['ANSWER']= (df['Total_to_Add'].where(df.Definition=='B', 0)
                                 .groupby(df['Ref_No']).transform('sum'))
print (df)
  Ref_No Definition  Total_to_Add  Total_'B'  ANSWER
0   ref1          B            20        240     240
1   ref2          A            30        100     100
2   ref1          B            40        240     240
3   ref2          A            50        100     100
4   ref1          B            60        240     240
5   ref2          B            50        100     100
6   ref1          B            60        240     240
7   ref2          B            50        100     100
8   ref1          B            60        240     240

Try:尝试:

df['Total_B'] = (df['Definition'].eq('B').mul(df['Total_to_Add'])
                 .groupby(df['Ref_No']).transform('sum'))

[out] [出去]

  Ref_No Definition  Total_to_Add  Total_B
0   ref1          B            20      240
1   ref2          A            30      100
2   ref1          B            40      240
3   ref2          A            50      100
4   ref1          B            60      240
5   ref2          B            50      100
6   ref1          B            60      240
7   ref2          B            50      100
8   ref1          B            60      240

This will produce the sum of 'Total_to_Add' in 'Total_B' column if the 'Definition' == 'B'.如果 'Definition' == 'B',这将在 'Total_B' 列中产生 'Total_to_Add' 的总和。 df['Total_B']=df[df['Definition']=='B'].groupby(by=['Ref_No','Definition'])['Total_to_Add'].transform('sum')

I will do transform我会做transform

s=df['Total_to_Add'].mask(df.Definition!='B').groupby(df['Ref_No']).transform('sum')
s
0    240.0
1    100.0
2    240.0
3    100.0
4    240.0
5    100.0
6    240.0
7    100.0
8    240.0
Name: Total_to_Add, dtype: float64

df['New']=s

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM