简体   繁体   English

如何计算缺失值的百分比

[英]How to calculate percentage of missing values

Edit: Apologies I actually missed out on an important grouping of data. 编辑:很抱歉,我实际上错过了重要的数据分组。 Thanks for those who already helped. 感谢那些已经帮助过的人。

I have a data set that has missing data. 我有一个缺少数据的数据集。 I have filled the missing values with 0. Using Python and Pandas I am trying to get to a metric for each team, the % of Apps they are working on that are complete. 我用0填充了缺失的值。我试图使用Python和Pandas来确定每个团队的指标,他们正在研究的Apps百分比已完成。 My thought was to groupby on ColA, then do counts on Col C, but I cant figure out how to get counts of complete and counts of total to do the calculation. 我的想法是对ColA进行分组,然后对Col C进行计数,但是我无法弄清楚如何获取完成计数和总计计数来进行计算。 Any ideas are much appreciated. 任何想法都非常感谢。

So I want something that looks like this 所以我想要看起来像这样的东西

  Team A  App1 High 0%
  Team A  App3 Med  100%
  Team B  App2 Med  0%
  And so on. 

My df looks like the following 我的df如下所示

  +--------+-------+-------+----------+
  | Col A  | Col B | Col C |  Col D   |
  +--------+-------+-------+----------+
  | Team A | App1  | High  | 0        |
  | Team A | App1  | High  | 0        |
  | Team A | App3  | Med   | Complete |
  | Team B | App2  | Med   | 0        |
  | Team B | App2  | High  | Complete |
  | Team C | App1  | Low   | Complete |
  +--------+-------+-------+----------+
df['count'] = df.groupby(['Col A', 'Col B', 'Col C'])['Col D'].transform(lambda x: (x==0).sum())
df['share'] = df.groupby(['Col A', 'Col B', 'Col C'])['Col D'].transform(lambda x: '{:.2f}%'.format((x==0).sum()/len(x)*100))

yields: 收益率:

      Col A    Col B    Col C       Col D count    share
0   Team A    App1     High             0     2  100.00%
1   Team A    App1     High             0     2  100.00%
2   Team A    App3     Med      Complete      0    0.00%
3   Team B    App2     Med              0     1  100.00%
4   Team B    App2     High     Complete      0    0.00%
5   Team C    App1     Low      Complete      0    0.00%

or just: 要不就:

df.groupby(['Col A', 'Col B', 'Col C'])['Col D'].apply(lambda x: '{:.2f}%'.format((x==0).sum()/len(x)*100))

Col A     Col B    Col C  
 Team A    App1     High      100.00%
           App3     Med         0.00%
 Team B    App2     High        0.00%
                    Med       100.00%
 Team C    App1     Low         0.00%

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM