简体   繁体   中英

pandas DataFrame: Calculate Sum based on boolean values in another column

I am fairly new to Python and I trying to simulate the following logic with in pandas

在此处输入图像描述

I am currently looping throw the rows and want to sum the values in the AMOUNT column in the prior rows but only till the last seen 'TRUE' value. It seems inefficient with the actual data (I have a dataframe of about 5 million rows)? Was wondering what the efficient way of handling such a logic in Python would entail?

Logic: The logic is that if FLAG is TRUE I want to sum the values in the AMOUNT column in the prior rows but only till the last seen 'TRUE' value. Basically sum the values in 'AMOUNT' between the rows where FLAG is TRUE

Check with cumsum and transform sum

df['SUM']=df.groupby(df['FLAG'].cumsum()).Amount.transform('sum').where(df.FLAG)

maybe try something around the following:

import pandas

df = pd.read_csv('name of file.csv')

df['AMOUNT'].sum()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM