Python Pandas groupby，带有不同值的日期列，然后返回一个dataframe，日期列填充最新日期

Question

所以我有这样的数据：

我想对它们进行分组并对第 0 个月 - 第 3 个月的值求和，我可以使用pandas groupby来实现。 问题是结束日期列有不同的值，我想在列中取最新日期。 对于此示例，表示我希望结束日期列的值为 2020-09-25。 如下：

如何使用 pandas groupby 执行此操作？ 为方便起见，列名称的变量如下：

details_columns = [ "Person Name", "Bill rate", "Project ERP","Status", "Assignment", "Engagement Code", "End date"]
sum_columns = ["Month 0", "Month 1", "Month 2", "Month 3"]

我需要返回值是DataFrame希望任何人都可以提供帮助。 谢谢！

文本数据：

Person Name Bill rate Project ERP Status Assignment Engagement Code End date Current Month U% Month 1 U% Month 2 U% Month 3 U%
John Doe 3500000 0.58 Chargeable - Standard Project A 21572323 2020-08-22 0 0.5 0.3 0.2
John Doe 3500000 0.58 Chargeable - Standard Project A 21572323 2020-05-22 0.4 0.25 0 0
John Doe 3500000 0.45 Chargeable - Standard Project B 21579528 2020-09-25 0 0.7 0.7 0.7
John Doe 3500000 0.45 Chargeable - Standard Project B 21579528 2020-05-22 0.2 0.12 0 0
John Doe 3500000 0.45 Chargeable - Standard Project B 21579528 2020-04-03 0.1 0 0 0

Answer 1

为总和列创建字典d并为列End date创建最大值，然后按GroupBy.agg聚合，最后添加DataFrame.reindex用于与原始DataFrame等相同顺序的列：

cols = ["Person Name", "Bill rate", "Project ERP","Status", "Assignment","Engagement Code"]
sum_columns = ["Current Month U%", "Month 1 U%", "Month 2 U%","Month 3 U%"]

d = dict.fromkeys(sum_columns, 'sum')
d["End date"] = 'max'

df1 = df.groupby(cols, as_index=False).agg(d).reindex(df.columns, axis=1)
print (df1)
  Person Name  Bill rate  Project ERP       Status          Assignment  \
0    John Doe    3500000         0.45  Chargeable   Standard Project B   
1    John Doe    3500000         0.58  Chargeable   Standard Project A   

   Engagement Code    End date  Current Month U%  Month 1 U%  Month 2 U%  \
0         21579528  2020-09-25               0.3        0.82         0.7   
1         21572323  2020-08-22               0.4        0.75         0.3   

   Month 3 U%  
0         0.7  
1         0.2

Python Pandas groupby，带有不同值的日期列，然后返回一个dataframe，日期列填充最新日期

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-04-17 08:36:21

Python Pandas groupby，带有不同值的日期列，然后返回一个dataframe，日期列填充最新日期

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-04-17 08:36:21

解决方案1
0 已采纳 2020-04-17 08:36:21