简体   繁体   中英

Subsetting variables with missing values in R

I have a dataset with 50 variables (columns) and 30 of them have missing values more than half its own observations.

I want to subset a dataset where those 30 variables with too many missing values are gone. I think I can do it one by one, but I was just wondering if there could be a way to do it more quickly in R.

Logic : First iterate through each column using sapply and check which all columns have less than half missing values. The output from first line is a logical vector which is used to subset the data.

ind <- sapply( colnames(df), function(x) sum(is.na(df[[x]])) < nrow(df)/2)
df <- df[colnames(df)[ind]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM