Pandas：每列值的nan百分比

Question

目标：为df的每一列和每个客户获取缺失值的百分比

我的 df 是关于创建票证的：

          id                type  ...      priority          Client
0     56 113            Incident  ...          Low           client1
1     56 267             Demande  ...          High          client1
2     56 294            Incident  ...          Nan           NaN
3     56 197             Demande  ...          Low           client3
4     56 143             Demande  ...          Nan           client4

第一次尝试：

df.notna().sum()/len(agg_global)*100
Out[29]:                       
id                       97.053453   
type                     76.415869   
priority                 82.626625    
client                   84.596443

这非常有用，但我想在我的 output 中添加更多详细信息，列中的“客户”维度如下所示：

Output 我想创建：

                           Client1   Client2     Client3      NaN
id                      100.000000   100.000000  100.000000   66.990424
type                     76.415869   66.990424   76.415869    43.761970
status                  100.000000   100.000000  66.990424    76.415869
category                66.990424   43.761970   76.415869     43.761970
entity                   43.761970   100.000000  76.415869    76.415869
source_demande           84.596443   100.000000  76.415869    43.761970

我尝试使用“groupby”，但无法获得 output... 的愿望：

                   id       type  ...      priority         Client
client                            ...                             
True        97.053453  76.415869  ...      29.98632       29.98632

任何建议将被认真考虑。 感谢您的关注！

Answer 1

您可以删除列Client以不测试缺失值的百分比，通过DataFrame.isna对其进行测试，通过Client聚合平均值并替换NaN以避免丢失它们，最后由DataFrame.T转置：

print (df)
       id      type priority   Client
0     NaN  Incident      Low  client1
1     NaN       NaN     High  client1
2  56 294  Incident      Nan      NaN
3  56 197       NaN      Low  client3
4     NaN   Demande      NaN  client4


df = (df.drop('Client', 1)
        .isna()
        .groupby(df['Client'].fillna('NaN'))
        .mean()
        .rename_axis(None)
        .T)
print (df)
          NaN  client1  client3  client4
id        0.0      1.0      0.0      1.0
type      0.0      0.5      1.0      0.0
priority  0.0      0.0      0.0      1.0

Answer 2

据我所知，可以使用蛮力。 我会尝试使用isna function 和求和来估计每行或每列中的 NaN 数量，然后我会尝试估计百分比。

Pandas：每列值的nan百分比

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-07-21 12:50:30

解决方案2
1 2020-07-21 12:41:14

Pandas：每列值的nan百分比

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-07-21 12:50:30

解决方案2 1 2020-07-21 12:41:14

解决方案1
2 已采纳 2020-07-21 12:50:30

解决方案2
1 2020-07-21 12:41:14