简体   繁体   中英

Filtering Pandas DataFrame for percentage of missing values

I have a Pandas DataFrame with quite a few missing values that are being represented by np.nan . I would like to be able to return the rows in the DataFrame having more than 80% of their values missing.

So far I have tried the following:

data.loc[lambda x: (len(x.isna()) / len(x.columns)) > .8]

but this is apparently not how loc works when passed a lambda function. My interpretation of this was that Pandas was simply running a loop over each row and applying the function, expecting a True or False value in return indicating to keep or discard the row, respectively. Essentially a filter function.

Is there a Pandas way of achieving what I want or shall I resort to plain python?

Using dropna with thresh (thresh : Require that many non-NA values.)

df.dropna(thresh=len(df.columns)*0.8)

Update :

df[(df.isna().sum(1)/df.shape[1]).gt(0.8)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM