[英]Pandas add column using groupby dataframe by sorting date column
[英]Python Pandas groupby, with a date column with different values, then returns a dataframe with the date column filled with the latest date
我想对它们进行分组并对第 0 个月 - 第 3 个月的值求和,我可以使用pandas groupby来实现。 问题是结束日期列有不同的值,我想在列中取最新日期。 对于此示例,表示我希望结束日期列的值为 2020-09-25。 如下:
如何使用 pandas groupby 执行此操作? 为方便起见,列名称的变量如下:
details_columns = [ "Person Name", "Bill rate", "Project ERP","Status", "Assignment", "Engagement Code", "End date"]
sum_columns = ["Month 0", "Month 1", "Month 2", "Month 3"]
我需要返回值是DataFrame希望任何人都可以提供帮助。 谢谢!
文本数据:
Person Name Bill rate Project ERP Status Assignment Engagement Code End date Current Month U% Month 1 U% Month 2 U% Month 3 U%
John Doe 3500000 0.58 Chargeable - Standard Project A 21572323 2020-08-22 0 0.5 0.3 0.2
John Doe 3500000 0.58 Chargeable - Standard Project A 21572323 2020-05-22 0.4 0.25 0 0
John Doe 3500000 0.45 Chargeable - Standard Project B 21579528 2020-09-25 0 0.7 0.7 0.7
John Doe 3500000 0.45 Chargeable - Standard Project B 21579528 2020-05-22 0.2 0.12 0 0
John Doe 3500000 0.45 Chargeable - Standard Project B 21579528 2020-04-03 0.1 0 0 0
为总和列创建字典d
并为列End date
创建最大值,然后按GroupBy.agg
聚合,最后添加DataFrame.reindex
用于与原始DataFrame
等相同顺序的列:
cols = ["Person Name", "Bill rate", "Project ERP","Status", "Assignment","Engagement Code"]
sum_columns = ["Current Month U%", "Month 1 U%", "Month 2 U%","Month 3 U%"]
d = dict.fromkeys(sum_columns, 'sum')
d["End date"] = 'max'
df1 = df.groupby(cols, as_index=False).agg(d).reindex(df.columns, axis=1)
print (df1)
Person Name Bill rate Project ERP Status Assignment \
0 John Doe 3500000 0.45 Chargeable Standard Project B
1 John Doe 3500000 0.58 Chargeable Standard Project A
Engagement Code End date Current Month U% Month 1 U% Month 2 U% \
0 21579528 2020-09-25 0.3 0.82 0.7
1 21572323 2020-08-22 0.4 0.75 0.3
Month 3 U%
0 0.7
1 0.2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.