简体   繁体   English

在 pandas df 中按日期对组进行排名和聚合

[英]Ranking and Aggregating by dates on a group in pandas df

I'm trying to create a new column that calculates the % completion per Project ID.我正在尝试创建一个新列来计算每个项目 ID 的完成百分比。 I'm currently calculating the Project Week Num (group by Project ID) but I want to calculate the % completion, meaning (current [Project Week Num] based on ReportDate divided by total number of [Project Week Num]).我目前正在计算项目周数(按项目 ID 分组),但我想计算完成百分比,意思是(基于 ReportDate 的当前 [项目周数] 除以 [项目周数] 的总数)。

Here is the code on how I'm calculating current Project Week Num:这是我如何计算当前项目周数的代码:

df['Project Week Num'] = df.groupby(['Project ID'])["ReportDate"].transform(lambda x: list(map(lambda y: dict(map(reversed, dict(enumerate(x.unique())).items()))[y]+1,x)))

For the example in the screenshot, this project has 106 total reports, so when对于截图中的例子,这个项目总共有 106 个报告,所以当

Project Week Num = 1, the [% Project Completition] would = 0.94%项目周数 = 1,[% 项目完成] 将 = 0.94%

Project Week Num = 2, the [% Project Completition] would = 1.88%项目周数 = 2,[% 项目完成] = 1.88%

etc ETC

在此处输入图像描述

Use:利用:

#if ReportDate are unique
df['Project Week Num'] = df.groupby('Project ID').cumcount()

s = df.groupby(['Project ID'])['Project ID'].transform('size')
df['%'] = df['Project Week Num'].div(s)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM