Remove Columns with missing values above a threshold pandas

Question

I am doing data preprocessing and want to remove features/columns which have more than say 10% missing values.

I have made the below code:

df_missing=df.isna()
result=df_missing.sum()/len(df)
result

Default           0.010066
Income            0.142857
Age               0.109090
Name              0.047000
Gender            0.000000
Type of job       0.200000
Amt of credit     0.850090
Years employed    0.009003
dtype: float64

I want df to have columns only where there are no missing values above 10%.

Expected output:

df

Default   Name   Gender   Years employed

(columns where there were missing values greater than 10% are removed.)

I have tried

result.iloc[:,0] 
IndexingError: Too many indexers

Please help

Answer 1

Because division of sum by length is mean , you can instead df_missing.sum()/len(df) use df_missing.mean() :

result = df.isna().mean()

Then filter by DataFrame.loc with : for all rows and columns by mask:

df = df.loc[:,result > .1]

Answer 2

它应该是df = df.loc[:,result < .1]因为用户只想保留缺少行数少于 10% 的列

Remove Columns with missing values above a threshold pandas

Question

2 answers

solution1
4 ACCPTED 2020-02-28 11:30:24

solution2
1 2021-03-05 14:25:33

Remove Columns with missing values above a threshold pandas

Question

2 answers

solution1 4 ACCPTED 2020-02-28 11:30:24

solution2 1 2021-03-05 14:25:33

solution1
4 ACCPTED 2020-02-28 11:30:24

solution2
1 2021-03-05 14:25:33