简体   繁体   中英

Select columns with many observations

I have a dataset with lots of observations and lots of variables. But some variables only have real values for a few observations. How can I delete variables that have less than, say, 500 observations?

I've been trying to figure out a way to do this in the context of dplyr , but select() doesn't seem to work that way.

This doesn't quite make sense either, but it's the direction I've been thinking:

dat[,sum(!is.na) > 500]

We can use vapply

dat[vapply(dat, function(x) sum(is.na(x)) <=500, 0)]

Or with Filter

Filter(function(x) sum(is.na(x)) <= 500, dat)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM