简体   繁体   中英

How can I find out the names of columns that satisfy a condition in a data frame

I wish to know (by name) which columns in my data frame satisfy a particular condition. For example, if I was looking for the names of any columns that contained more than 3 NA, how could I proceed?

>frame
  m  n  o  p
1 0 NA NA NA
2 0  2  2  2
3 0 NA NA NA
4 0 NA NA  1
5 0 NA NA NA
6 0  1  2  3
> for (i in frame){
  na <- is.na(i)
  as.numeric(na)
  total<-sum(na)
  if(total>3){
  print (i) }}
[1]  NA  2 NA NA NA  1
[2]  NA  2 NA NA NA  2

So this actually succeeds in evaluating which columns satisfy the condition, however, it does not display the column name. Perhaps subsetting the columns which interest me would be another way to do it, but I'm not sure how to solve it that way either. Plus I'd prefer to know if there's a way to just get the names directly.
I'll appreciate any input.

We can use colSums on a logical matrix ( is.na(frame) ), check whether it is greater than 3 to get a logical vector and then subset the names of 'frame' based on that.

names(frame)[colSums(is.na(frame))>3]
#[1] "n" "o"

If we are using dplyr , one way is

library(dplyr)
frame %>% 
  summarise_each(funs(sum(is.na(.))>3)) %>% 
  unlist() %>% 
  names(.)[.]
#[1] "n" "o"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM