Here is the scenario:
I have a large ordered dataset with 314 columns and over 300.000 lines for a ML problem.
I wanna group by the dataset by column X (suppliers).
Desired output:
Since we are talking about a 314 columns dataset I can't just create a dict containing each column.
df_train.groupby('Supplier').agg({<some columns> : 'last', <some columns>: 'sum', <some columns>: 'mean' })
PS: I ordered the columns using the sequence that I wanna apply the different aggregations.
You could use select_dtypes
to get the columns that are numeric, and use these in a dictionary comprehension.
numeric_cols = df_train.select_dtypes('numeric').columns
agg_dict = {c: 'sum' if c in numeric_cols else 'last' for c in df_train.columns}
grouped = df_train.groupby('Supplier').agg(agg_dict)
With regards to your one-hot encoded columns, you will need to provide more information as to how they might be identified.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.