Subsetting variables with missing values in R

Question

I have a dataset with 50 variables (columns) and 30 of them have missing values more than half its own observations.

I want to subset a dataset where those 30 variables with too many missing values are gone. I think I can do it one by one, but I was just wondering if there could be a way to do it more quickly in R.

Answer 1

Logic : First iterate through each column using sapply and check which all columns have less than half missing values. The output from first line is a logical vector which is used to subset the data.

ind <- sapply( colnames(df), function(x) sum(is.na(df[[x]])) < nrow(df)/2)
df <- df[colnames(df)[ind]]

Subsetting variables with missing values in R

Question

1 answers

solution1
0 ACCPTED 2017-01-26 14:12:42

Subsetting variables with missing values in R

Question

1 answers

solution1 0 ACCPTED 2017-01-26 14:12:42

solution1
0 ACCPTED 2017-01-26 14:12:42