I have a Pandas DataFrame with quite a few missing values that are being represented by np.nan
. I would like to be able to return the rows in the DataFrame having more than 80% of their values missing.
So far I have tried the following:
data.loc[lambda x: (len(x.isna()) / len(x.columns)) > .8]
but this is apparently not how loc
works when passed a lambda function. My interpretation of this was that Pandas was simply running a loop over each row and applying the function, expecting a True
or False
value in return indicating to keep or discard the row, respectively. Essentially a filter function.
Is there a Pandas way of achieving what I want or shall I resort to plain python?
Using dropna
with thresh
(thresh : Require that many non-NA values.)
df.dropna(thresh=len(df.columns)*0.8)
Update :
df[(df.isna().sum(1)/df.shape[1]).gt(0.8)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.