[英]How to calculate percentage of missing values
Edit: Apologies I actually missed out on an important grouping of data. 编辑:很抱歉,我实际上错过了重要的数据分组。 Thanks for those who already helped.
感谢那些已经帮助过的人。
I have a data set that has missing data. 我有一个缺少数据的数据集。 I have filled the missing values with 0. Using Python and Pandas I am trying to get to a metric for each team, the % of Apps they are working on that are complete.
我用0填充了缺失的值。我试图使用Python和Pandas来确定每个团队的指标,他们正在研究的Apps百分比已完成。 My thought was to groupby on ColA, then do counts on Col C, but I cant figure out how to get counts of complete and counts of total to do the calculation.
我的想法是对ColA进行分组,然后对Col C进行计数,但是我无法弄清楚如何获取完成计数和总计计数来进行计算。 Any ideas are much appreciated.
任何想法都非常感谢。
So I want something that looks like this 所以我想要看起来像这样的东西
Team A App1 High 0%
Team A App3 Med 100%
Team B App2 Med 0%
And so on.
My df looks like the following 我的df如下所示
+--------+-------+-------+----------+
| Col A | Col B | Col C | Col D |
+--------+-------+-------+----------+
| Team A | App1 | High | 0 |
| Team A | App1 | High | 0 |
| Team A | App3 | Med | Complete |
| Team B | App2 | Med | 0 |
| Team B | App2 | High | Complete |
| Team C | App1 | Low | Complete |
+--------+-------+-------+----------+
df['count'] = df.groupby(['Col A', 'Col B', 'Col C'])['Col D'].transform(lambda x: (x==0).sum())
df['share'] = df.groupby(['Col A', 'Col B', 'Col C'])['Col D'].transform(lambda x: '{:.2f}%'.format((x==0).sum()/len(x)*100))
yields: 收益率:
Col A Col B Col C Col D count share
0 Team A App1 High 0 2 100.00%
1 Team A App1 High 0 2 100.00%
2 Team A App3 Med Complete 0 0.00%
3 Team B App2 Med 0 1 100.00%
4 Team B App2 High Complete 0 0.00%
5 Team C App1 Low Complete 0 0.00%
or just: 要不就:
df.groupby(['Col A', 'Col B', 'Col C'])['Col D'].apply(lambda x: '{:.2f}%'.format((x==0).sum()/len(x)*100))
Col A Col B Col C
Team A App1 High 100.00%
App3 Med 0.00%
Team B App2 High 0.00%
Med 100.00%
Team C App1 Low 0.00%
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.