Select columns with many observations

Question

I have a dataset with lots of observations and lots of variables. But some variables only have real values for a few observations. How can I delete variables that have less than, say, 500 observations?

I've been trying to figure out a way to do this in the context of dplyr , but select() doesn't seem to work that way.

This doesn't quite make sense either, but it's the direction I've been thinking:

dat[,sum(!is.na) > 500]

Answer 1

We can use vapply

dat[vapply(dat, function(x) sum(is.na(x)) <=500, 0)]

Or with Filter

Filter(function(x) sum(is.na(x)) <= 500, dat)

Select columns with many observations

Question

1 answers

solution1
0 2016-03-21 02:33:55

Select columns with many observations

Question

1 answers

solution1 0 2016-03-21 02:33:55

solution1
0 2016-03-21 02:33:55