I have a dataset with lots of observations and lots of variables. But some variables only have real values for a few observations. How can I delete variables that have less than, say, 500 observations?
I've been trying to figure out a way to do this in the context of dplyr
, but select()
doesn't seem to work that way.
This doesn't quite make sense either, but it's the direction I've been thinking:
dat[,sum(!is.na) > 500]
We can use vapply
dat[vapply(dat, function(x) sum(is.na(x)) <=500, 0)]
Or with Filter
Filter(function(x) sum(is.na(x)) <= 500, dat)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.