繁体   English   中英

根据 Pandas Dataframe 中其他列的条件获取两列的总和

[英]Get sum of two columns based on conditions of other columns in a Pandas Dataframe

我有以下数据框:

data = {"Subject":["1","2","3","3","4","5","5"],
        "date": ["2020-05-01 16:54:25","2020-05-03 10:31:18","2020-05-08 10:10:40","2020-05-08 10:10:42","2020-05-06 09:30:40","2020-05-07 12:46:30","2020-05-07 12:55:10"],
        "Accept": ["True","False","True","True","False","True","True"],
        "Amount" : [150,30,32,32,300,100,50],
        "accept_1": ["True","False","True","True","False","True","True"],
        "amount_1" : [20,30,32,32,150,100,30]}
data = pd.DataFrame(data)

我想按主题日期对数据进行分类,然后继续计算每个主题,如果Acceptaccept_1都为真,则Amountamount_1的总和。

这里的真/假不是布尔值,而是字符串。

我尝试了以下代码:

def PPP(tx_amount_1,tx_accepted_1,tx_amount,tx_accepted):
if tx_accepted_1 and tx_accepted == "True":
    return tx_amount + tx_amount_1

example = data.groupby(["Subject","date"]) 
[["Accept","Amount","accept_1","amount_1"]].apply(lambda 
x: PPP(x["amount_1"],x["accept_1"],x["Amount"],x["Accept"]))

我收到以下错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() 
or a.all().

IIUC,首先通过boolean indexing对匹配条件的行进行切片,然后执行GroupBy.sum

mask = data['Accept'].eq('True') & data['accept_1'].eq('True')
data[mask].groupby(['Subject', pd.to_datetime(data['date']).dt.normalize()]).sum()

输出:

                    Amount  amount_1
Subject date                        
1       2020-05-01     150        20
3       2020-05-08      64        64
5       2020-05-07     150       130

如果你想要一个总计:

mask = data['Accept'].eq('True') & data['accept_1'].eq('True')
(data[mask]
 .groupby(['Subject', pd.to_datetime(data['date']).dt.normalize()])
 .sum().sum(axis=1)
 .reset_index(name='Total')
)

输出:

  Subject       date  Total
0       1 2020-05-01    170
1       3 2020-05-08    128
2       5 2020-05-07    280

保持不匹配条件为0

mask = data['Accept'].eq('True') & data['accept_1'].eq('True')
cols = ['Amount', 'amount_1']
(data
 .assign(**{c: data[c].where(mask, 0) for c in cols})
 .groupby(['Subject', pd.to_datetime(data['date']).dt.normalize()])
 .sum()
 #.sum(axis=1).reset_index(name='Total') # uncomment for grand-total
)

输出:

                    Amount  amount_1
Subject date                        
1       2020-05-01     150        20
2       2020-05-03       0         0
3       2020-05-08      64        64
4       2020-05-06       0         0
5       2020-05-07     150       130

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM