简体   繁体   中英

Pandas Groupby using different agg methods for different columns

Here is the scenario:

  • I have a large ordered dataset with 314 columns and over 300.000 lines for a ML problem.

  • I wanna group by the dataset by column X (suppliers).

  • One column is a datetime type, some columns are numeric by nature and others were one-hot encoded from some categorical columns.

Desired output:

  • I wanna groupby column X, and aggregate the numeric columns by "mean", some columns by "last", and the one-hot-encoded ones by "sum". All on the same agg method.

Since we are talking about a 314 columns dataset I can't just create a dict containing each column.

df_train.groupby('Supplier').agg({<some columns> : 'last', <some columns>: 'sum', <some columns>: 'mean' })

PS: I ordered the columns using the sequence that I wanna apply the different aggregations.

You could use select_dtypes to get the columns that are numeric, and use these in a dictionary comprehension.

numeric_cols = df_train.select_dtypes('numeric').columns

agg_dict = {c: 'sum' if c in numeric_cols else 'last' for c in df_train.columns}

grouped = df_train.groupby('Supplier').agg(agg_dict)

With regards to your one-hot encoded columns, you will need to provide more information as to how they might be identified.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM