I have a dataframe, df, where I would like to perform an addition based on values in a different column. The addition calculation is based on the id value.
Data
id date value start status
aa Q1.22 hello 80
aa Q2.22 hi -200 2000
aa Q3.22 hey -400
aa Q4.22 sure -500
bb Q3.23 ok 30
bb Q4.23 sure 10 100
Desired
id date value start status update
aa Q1.22 hello 80
aa Q2.22 hi -200 2000 1800
aa Q3.22 hey -400 1400
aa Q4.22 sure -500 900
bb Q3.23 ok 30
bb Q4.23 sure 10 100 110
Logic
The addition will begin at the first display of a single value. For example we see that the number 2000 is a single non consecutive value. The start point of the addition calculation will begin here. Similar to the 100 value (it is an ending value)
We are just adding the values in the 'start' column to the value in the 'status' column. The adding ends with the id. The adding calculation is grouped by id.
Doing
df = df.groupby(["id"]).sum()
df['update'] = df.groupby(["id"]).sum()['start']
However,
I am still researching this, any suggestion is appreciated.
to_numeric
to convert status
to numeric with NaN
values groupby ffill
the status
column this will fill move numeric values forward (leaving starting NaN
as NaN
Series.mask
to remove unwanted values from start
then groupby cumsum
status
columns.df['status'] = pd.to_numeric(df['status'], errors='coerce')
status = df.groupby('id')['status'].ffill()
df['update'] = (
status + df['start'].mask(status.isna()).groupby(df['id']).cumsum()
)
df
:
id date value start status update
0 aa Q1.22 hello 80 NaN NaN
1 aa Q2.22 hi -200 2000.0 1800.0
2 aa Q3.22 hey -400 NaN 1400.0
3 aa Q4.22 sure -500 NaN 900.0
4 bb Q3.23 ok 30 NaN NaN
5 bb Q4.23 sure 10 100.0 110.0
Optionally, DataFrame.fillna
to add back the empty strings:
df = df.fillna('')
df
:
id date value start status update
0 aa Q1.22 hello 80
1 aa Q2.22 hi -200 2000.0 1800.0
2 aa Q3.22 hey -400 1400.0
3 aa Q4.22 sure -500 900.0
4 bb Q3.23 ok 30
5 bb Q4.23 sure 10 100.0 110.0
Breakdown of Steps:
(Step 2 Forward Fill status
in groups)
df.groupby('id')['status'].ffill()
0 NaN
1 2000.0
2 2000.0
3 2000.0
4 NaN
5 100.0
Name: status, dtype: float64
(Step 3.1 Remove unwanted values from start
)
df['start'].mask(status.isna())
0 NaN
1 -200.0
2 -400.0
3 -500.0
4 NaN
5 10.0
Name: start, dtype: float64
(Step 3.2 cumsum
in groups the kept values)
df['start'].mask(status.isna()).groupby(df['id']).cumsum()
0 NaN
1 -200.0
2 -600.0
3 -1100.0
4 NaN
5 10.0
Name: start, dtype: float64
(Step 4 add to forward filled status
)
status + df['start'].mask(status.isna()).groupby(df['id']).cumsum()
0 NaN
1 1800.0
2 1400.0
3 900.0
4 NaN
5 110.0
dtype: float64
Try something new
idx = df.groupby('id').status.transform('idxmax')
df['new'] = df.status.add(df.start,fill_value=0)[df.index>=idx].groupby(df['id']).cumsum()
df
Out[270]:
id date value start status new
0 aa Q1.22 hello 80 NaN NaN
1 aa Q2.22 hi -200 2000.0 1800.0
2 aa Q3.22 hey -400 NaN 1400.0
3 aa Q4.22 sure -500 NaN 900.0
4 bb Q3.23 ok 30 NaN NaN
5 bb Q4.23 sure 10 100.0 110.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.