简体   繁体   English

只保留符合条件的列

[英]Keep only columns that meet a criterion

I have a large data frame whose values are either TRUE , FALSE , or NA .我有一个大数据框,其值为TRUEFALSENA I want to keep only the columns that contains at least one TRUE value.我只想保留至少包含一个TRUE值的列。 How do achieve this?如何做到这一点?

Here's a minimal example:这是一个最小的例子:

df <- data.frame(
   c1 = c(FALSE,FALSE,FALSE,FALSE),
   c2 = c(FALSE,TRUE,FALSE,NA),
   c3 = c(FALSE,NA,TRUE,NA),
   c4 = c(FALSE,FALSE,NA,NA)
 )
> df
     c1    c2    c3    c4
1 FALSE FALSE FALSE FALSE
2 FALSE  TRUE    NA FALSE
3 FALSE FALSE  TRUE    NA
4 FALSE    NA    NA    NA

I want to remove columns c1 and c4 , and keep only c2 and c3 .我想删除列c1c4 ,只保留c2c3 I know that TRUE values exist in my original larger data frame (using table(df==TRUE) ), but I don't know which function(s) to use to identify their columns.我知道TRUE值存在于我原来的较大数据框中(使用table(df==TRUE) ),但我不知道使用哪个函数来标识它们的列。

We can use select with any我们可以将selectany

library(dplyr)
df %>%
   select(where(~ is.logical(.x) && any(.x, na.rm = TRUE)))

-output -输出

  c2    c3
1 FALSE FALSE
2  TRUE    NA
3 FALSE  TRUE
4    NA    NA

Or in base R with colSums on the columns and check if the sum is greater than 1 ( TRUE -> 1 and FALSE -> 0)或者在base R中,列上有colSums并检查总和是否大于 1( TRUE -> 1 和FALSE -> 0)

df[colSums(df, na.rm = TRUE) > 0]

-output -输出

   c2    c3
1 FALSE FALSE
2  TRUE    NA
3 FALSE  TRUE
4    NA    NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM