Pandas sum by date indexed but exclude totals column

Question

I have a dataframe that is being read from database records and looks like this:

date          added    total
2020-09-14        5        5
2020-09-15        4        9
2020-09-16        2        11

I need to be able to resample by different periods and this is what I am using:

df = pd.DataFrame.from_records(raw_data, index='date')
df.index = pd.to_datetime(df.index)

# let's say I want yearly sample, then I would do
df = df.fillna(0).resample('Y').sum()

This almost works, but it is obviously summing the total column, which is something I don't want. I need total column to be the value in the date sampled in the dataframe, like this:

# What I need
date    added    total
2020       11       11

# What I'm getting
date    added    total
2020       11       25

Answer 1

You can do this by resampling differently for different columns. Here you want to use sum() aggregator for the added column, but max() for the total .

df = pd.DataFrame({'date':[20200914, 20200915, 20200916, 20210101, 20210102], 
                   'added':[5, 4, 2, 1, 6], 
                   'total':[5, 9, 11, 1, 7]})
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')

df_res = df.resample('Y', on='date').agg({'added':'sum', 'total':'max'})

And the result is:

df_res
            added  total
date                    
2020-12-31     11     11
2021-12-31      7      7

Pandas sum by date indexed but exclude totals column

Question

1 answers

solution1
1 2020-09-17 03:32:36

Pandas sum by date indexed but exclude totals column

Question

1 answers

solution1 1 2020-09-17 03:32:36

solution1
1 2020-09-17 03:32:36