简体   繁体   中英

Group and aggregate by multiple columns using Pandas NamedAgg

I have a dataframe that has columns arranged by date. The columns are readings taken every day for over a year. I am trying to aggregate and group this data to show quarterly aggregated numbers. I found pandas namedagg to possible support this, but I am struggling to pass multiple column names and apply a single aggregate function.

My sample dataset, showing city, zip and rest of columns arranged by date every day between 2020 to 2021

在此处输入图片说明

Here is what I am trying to achieve, and given below is an example of what I tried by passing multiple columns to the NamedAgg method, but it doesn't seem to accept it:

df.groupby(['city','zip']).agg(
  2021_q1=pd.NamedAgg(column=df.columns[1:89].values.tolist(),aggfunc=sum),
  2021_q2=pd.NamedAgg(column=df.columns[90:180].values.tolist(),aggfunc=sum),
  2021_q3=pd.NamedAgg(column=df.columns[181:240].values.tolist(),aggfunc=sum),
  2021_q4=pd.NamedAgg(column=df.columns[241:380].values.tolist(),aggfunc=sum),
  2022_q1=pd.NamedAgg(column=df.columns[381:450].values.tolist(),aggfunc=sum),
)

I get the error

TypeError: unhashable type: 'list'

Is there another way I should be passing the list of columns I want to aggregate or please suggest if there is better way of aggregating my dataset by quarterly numbers

Convert non dates columns to index, convert columns to datetimes and then aggregate values converted to quarter periods by DatetimeIndex.to_period :

df = df.set_index(['city','zip'])

df.columns = pd.to_datetime(df.columns)

df1 = df.groupby(df.columns.to_period('Q'), axis=1).sum()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM