I have a dataset, df, where I wish to calculate the percent increase of the sum of a particular group over a time period. Here is the dataset:
date size type
1/1/2020 3 a
1/1/2020 13 b
1/1/2020 1 c
2/1/2020 51 a
2/1/2019 10 b
Desired output
Then find percent diff and diff from earliest date,
date diff percentdiff type
2/1/2020 48 1600 a
1/1/2020 3 30 b
1/1/2020 0 0 c
We see that group 'a' went from 3 to 51 , ( from 1/1/2020 to 2/1/2020 ) which gives us a difference of 48 , and a percent difference of 1600% Group c is 0 because there is no change.
Percent Increase/Change is final-inital/initial * 100
This is what I have tried:
df1 = df.groupby(['type','date'])['size'].agg(lambda x:
(x.iloc[-1]/x.iloc[0]-1)*100).to_frame('increase')
df1['diff'] = df.groupby(['type','date']).agg(lambda x:x.iloc[-1]-x.iloc[0])
I am still researching this. Any suggestion is appreciated.
There is probably a more concise solution, but this works:
df['date'] = pd.to_datetime(df['date'])
grouped = df.sort_values('date').groupby(['type'])
output = pd.DataFrame({
'date': grouped['date'].agg(lambda x: x.iloc[-1]).values,
'diff': grouped['size'].agg(lambda x: x.diff().fillna(0).iloc[-1]).values,
'percentdiff': grouped['size'].agg(lambda x: x.pct_change().fillna(0).iloc[-1] * 100).values,
'type': grouped['type'].agg(lambda x: x.iloc[0]).values
})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.