[英]Pandas Groupby using different agg methods for different columns
Here is the scenario: 这是场景:
I have a large ordered dataset with 314 columns and over 300.000 lines for a ML problem. 我有一个大型有序数据集,包含314列和超过300.000行的ML问题。
I wanna group by the dataset by column X (suppliers). 我想通过X列(供应商)按数据集进行分组。
Desired output: 期望的输出:
Since we are talking about a 314 columns dataset I can't just create a dict containing each column. 由于我们讨论的是314列数据集,因此我不能仅创建包含每列的dict。
df_train.groupby('Supplier').agg({<some columns> : 'last', <some columns>: 'sum', <some columns>: 'mean' })
PS: I ordered the columns using the sequence that I wanna apply the different aggregations. PS:我使用我想要应用不同聚合的序列来排序列。
You could use select_dtypes
to get the columns that are numeric, and use these in a dictionary comprehension. 您可以使用
select_dtypes
来获取数字列,并在字典理解中使用它们。
numeric_cols = df_train.select_dtypes('numeric').columns
agg_dict = {c: 'sum' if c in numeric_cols else 'last' for c in df_train.columns}
grouped = df_train.groupby('Supplier').agg(agg_dict)
With regards to your one-hot encoded columns, you will need to provide more information as to how they might be identified. 关于您的单热编码列,您需要提供有关如何识别它们的更多信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.