I have a dataframe that is being read from database records and looks like this:
date added total
2020-09-14 5 5
2020-09-15 4 9
2020-09-16 2 11
I need to be able to resample
by different periods and this is what I am using:
df = pd.DataFrame.from_records(raw_data, index='date')
df.index = pd.to_datetime(df.index)
# let's say I want yearly sample, then I would do
df = df.fillna(0).resample('Y').sum()
This almost works, but it is obviously summing
the total
column, which is something I don't want. I need total column to be the value in the date
sampled in the dataframe, like this:
# What I need
date added total
2020 11 11
# What I'm getting
date added total
2020 11 25
You can do this by resampling differently for different columns. Here you want to use sum()
aggregator for the added column, but max()
for the total .
df = pd.DataFrame({'date':[20200914, 20200915, 20200916, 20210101, 20210102],
'added':[5, 4, 2, 1, 6],
'total':[5, 9, 11, 1, 7]})
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df_res = df.resample('Y', on='date').agg({'added':'sum', 'total':'max'})
And the result is:
df_res
added total
date
2020-12-31 11 11
2021-12-31 7 7
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.