Filtering Pandas DataFrame for percentage of missing values

Question

I have a Pandas DataFrame with quite a few missing values that are being represented by np.nan . I would like to be able to return the rows in the DataFrame having more than 80% of their values missing.

So far I have tried the following:

data.loc[lambda x: (len(x.isna()) / len(x.columns)) > .8]

but this is apparently not how loc works when passed a lambda function. My interpretation of this was that Pandas was simply running a loop over each row and applying the function, expecting a True or False value in return indicating to keep or discard the row, respectively. Essentially a filter function.

Is there a Pandas way of achieving what I want or shall I resort to plain python?

Answer 1

Using dropna with thresh (thresh : Require that many non-NA values.)

df.dropna(thresh=len(df.columns)*0.8)

Update :

df[(df.isna().sum(1)/df.shape[1]).gt(0.8)]

Filtering Pandas DataFrame for percentage of missing values

Question

1 answers

solution1
2 ACCPTED 2018-10-18 21:22:00

Filtering Pandas DataFrame for percentage of missing values

Question

1 answers

solution1 2 ACCPTED 2018-10-18 21:22:00

solution1
2 ACCPTED 2018-10-18 21:22:00