简体   繁体   中英

Cumulative Sum of a column based on values in another column?

I have a data frame similar to this one, I would like to get the cumulative sum of the column "Number" until column Name is equal to "AAAA". So basically, I would like to get the cumulative sum of the column 'Number' between each 'AAAA'. So the sum starts again each time we have 'AAAA' in the column. Is there a way I can do that?

data = {'Name':  ['AAAA','B','C','D','E','AAAA','O','C','D','E','AAAA','D', 'C','D','E','AAAA','B','C','D','E','AAAA','L','M'],
    'Number': [7,8,9,10,1,1,2,34,5,6,7,8,9,10,1,1,7,8,2,3,5,6,7]
    }

df = pd.DataFrame (data, columns = ['Name','Number'])
df['Sum_Cummulative']=df['Number'].cumsum() 

在此处输入图像描述

Use GroupBy.cumsum with helper Series created by compare Name with Series.cumsum :

df['Sum_Cummulative']=df.groupby(df['Name'].eq('AAAA').cumsum())['Number'].cumsum() 
print (df)
    Name  Number  Sum_Cummulative
0   AAAA       7                7
1      B       8               15
2      C       9               24
3      D      10               34
4      E       1               35
5   AAAA       1                1
6      O       2                3
7      C      34               37
8      D       5               42
9      E       6               48
10  AAAA       7                7
11     D       8               15
12     C       9               24
13     D      10               34
14     E       1               35
15  AAAA       1                1
16     B       7                8
17     C       8               16
18     D       2               18
19     E       3               21
20  AAAA       5                5
21     L       6               11
22     M       7               18

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM