Edit: Apologies I actually missed out on an important grouping of data. Thanks for those who already helped.
I have a data set that has missing data. I have filled the missing values with 0. Using Python and Pandas I am trying to get to a metric for each team, the % of Apps they are working on that are complete. My thought was to groupby on ColA, then do counts on Col C, but I cant figure out how to get counts of complete and counts of total to do the calculation. Any ideas are much appreciated.
So I want something that looks like this
Team A App1 High 0%
Team A App3 Med 100%
Team B App2 Med 0%
And so on.
My df looks like the following
+--------+-------+-------+----------+
| Col A | Col B | Col C | Col D |
+--------+-------+-------+----------+
| Team A | App1 | High | 0 |
| Team A | App1 | High | 0 |
| Team A | App3 | Med | Complete |
| Team B | App2 | Med | 0 |
| Team B | App2 | High | Complete |
| Team C | App1 | Low | Complete |
+--------+-------+-------+----------+
df['count'] = df.groupby(['Col A', 'Col B', 'Col C'])['Col D'].transform(lambda x: (x==0).sum())
df['share'] = df.groupby(['Col A', 'Col B', 'Col C'])['Col D'].transform(lambda x: '{:.2f}%'.format((x==0).sum()/len(x)*100))
yields:
Col A Col B Col C Col D count share
0 Team A App1 High 0 2 100.00%
1 Team A App1 High 0 2 100.00%
2 Team A App3 Med Complete 0 0.00%
3 Team B App2 Med 0 1 100.00%
4 Team B App2 High Complete 0 0.00%
5 Team C App1 Low Complete 0 0.00%
or just:
df.groupby(['Col A', 'Col B', 'Col C'])['Col D'].apply(lambda x: '{:.2f}%'.format((x==0).sum()/len(x)*100))
Col A Col B Col C
Team A App1 High 100.00%
App3 Med 0.00%
Team B App2 High 0.00%
Med 100.00%
Team C App1 Low 0.00%
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.