Pandas：每列值的nan百分比

Question

目標：為df的每一列和每個客戶獲取缺失值的百分比

我的 df 是關於創建票證的：

          id                type  ...      priority          Client
0     56 113            Incident  ...          Low           client1
1     56 267             Demande  ...          High          client1
2     56 294            Incident  ...          Nan           NaN
3     56 197             Demande  ...          Low           client3
4     56 143             Demande  ...          Nan           client4

第一次嘗試：

df.notna().sum()/len(agg_global)*100
Out[29]:                       
id                       97.053453   
type                     76.415869   
priority                 82.626625    
client                   84.596443

這非常有用，但我想在我的 output 中添加更多詳細信息，列中的“客戶”維度如下所示：

Output 我想創建：

                           Client1   Client2     Client3      NaN
id                      100.000000   100.000000  100.000000   66.990424
type                     76.415869   66.990424   76.415869    43.761970
status                  100.000000   100.000000  66.990424    76.415869
category                66.990424   43.761970   76.415869     43.761970
entity                   43.761970   100.000000  76.415869    76.415869
source_demande           84.596443   100.000000  76.415869    43.761970

我嘗試使用“groupby”，但無法獲得 output... 的願望：

                   id       type  ...      priority         Client
client                            ...                             
True        97.053453  76.415869  ...      29.98632       29.98632

任何建議將被認真考慮。 感謝您的關注！

Answer 1

您可以刪除列Client以不測試缺失值的百分比，通過DataFrame.isna對其進行測試，通過Client聚合平均值並替換NaN以避免丟失它們，最后由DataFrame.T轉置：

print (df)
       id      type priority   Client
0     NaN  Incident      Low  client1
1     NaN       NaN     High  client1
2  56 294  Incident      Nan      NaN
3  56 197       NaN      Low  client3
4     NaN   Demande      NaN  client4


df = (df.drop('Client', 1)
        .isna()
        .groupby(df['Client'].fillna('NaN'))
        .mean()
        .rename_axis(None)
        .T)
print (df)
          NaN  client1  client3  client4
id        0.0      1.0      0.0      1.0
type      0.0      0.5      1.0      0.0
priority  0.0      0.0      0.0      1.0

Answer 2

據我所知，可以使用蠻力。 我會嘗試使用isna function 和求和來估計每行或每列中的 NaN 數量，然后我會嘗試估計百分比。

Pandas：每列值的nan百分比

問題描述

2 個解決方案

解決方案1
2 已采納 2020-07-21 12:50:30

解決方案2
1 2020-07-21 12:41:14

Pandas：每列值的nan百分比

問題描述

2 個解決方案

解決方案1 2 已采納 2020-07-21 12:50:30

解決方案2 1 2020-07-21 12:41:14

解決方案1
2 已采納 2020-07-21 12:50:30

解決方案2
1 2020-07-21 12:41:14