Group and aggregate by multiple columns using Pandas NamedAgg

Question

I have a dataframe that has columns arranged by date. The columns are readings taken every day for over a year. I am trying to aggregate and group this data to show quarterly aggregated numbers. I found pandas namedagg to possible support this, but I am struggling to pass multiple column names and apply a single aggregate function.

My sample dataset, showing city, zip and rest of columns arranged by date every day between 2020 to 2021

Here is what I am trying to achieve, and given below is an example of what I tried by passing multiple columns to the NamedAgg method, but it doesn't seem to accept it:

df.groupby(['city','zip']).agg(
  2021_q1=pd.NamedAgg(column=df.columns[1:89].values.tolist(),aggfunc=sum),
  2021_q2=pd.NamedAgg(column=df.columns[90:180].values.tolist(),aggfunc=sum),
  2021_q3=pd.NamedAgg(column=df.columns[181:240].values.tolist(),aggfunc=sum),
  2021_q4=pd.NamedAgg(column=df.columns[241:380].values.tolist(),aggfunc=sum),
  2022_q1=pd.NamedAgg(column=df.columns[381:450].values.tolist(),aggfunc=sum),
)

I get the error

TypeError: unhashable type: 'list'

Is there another way I should be passing the list of columns I want to aggregate or please suggest if there is better way of aggregating my dataset by quarterly numbers

Answer 1

Convert non dates columns to index, convert columns to datetimes and then aggregate values converted to quarter periods by DatetimeIndex.to_period :

df = df.set_index(['city','zip'])

df.columns = pd.to_datetime(df.columns)

df1 = df.groupby(df.columns.to_period('Q'), axis=1).sum()

Group and aggregate by multiple columns using Pandas NamedAgg

Question

1 answers

solution1
0 2021-10-21 06:09:48

Group and aggregate by multiple columns using Pandas NamedAgg

Question

1 answers

solution1 0 2021-10-21 06:09:48

solution1
0 2021-10-21 06:09:48