简体   繁体   中英

Pandas sum by date indexed but exclude totals column

I have a dataframe that is being read from database records and looks like this:

date          added    total
2020-09-14        5        5
2020-09-15        4        9
2020-09-16        2        11

I need to be able to resample by different periods and this is what I am using:

df = pd.DataFrame.from_records(raw_data, index='date')
df.index = pd.to_datetime(df.index)

# let's say I want yearly sample, then I would do
df = df.fillna(0).resample('Y').sum()

This almost works, but it is obviously summing the total column, which is something I don't want. I need total column to be the value in the date sampled in the dataframe, like this:

# What I need
date    added    total
2020       11       11

# What I'm getting
date    added    total
2020       11       25

You can do this by resampling differently for different columns. Here you want to use sum() aggregator for the added column, but max() for the total .

df = pd.DataFrame({'date':[20200914, 20200915, 20200916, 20210101, 20210102], 
                   'added':[5, 4, 2, 1, 6], 
                   'total':[5, 9, 11, 1, 7]})
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')

df_res = df.resample('Y', on='date').agg({'added':'sum', 'total':'max'})

And the result is:

df_res
            added  total
date                    
2020-12-31     11     11
2021-12-31      7      7

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM