[英]Keep only columns that meet a criterion
I have a large data frame whose values are either TRUE
, FALSE
, or NA
.我有一个大数据框,其值为TRUE
、 FALSE
或NA
。 I want to keep only the columns that contains at least one TRUE
value.我只想保留至少包含一个TRUE
值的列。 How do achieve this?如何做到这一点?
Here's a minimal example:这是一个最小的例子:
df <- data.frame(
c1 = c(FALSE,FALSE,FALSE,FALSE),
c2 = c(FALSE,TRUE,FALSE,NA),
c3 = c(FALSE,NA,TRUE,NA),
c4 = c(FALSE,FALSE,NA,NA)
)
> df
c1 c2 c3 c4
1 FALSE FALSE FALSE FALSE
2 FALSE TRUE NA FALSE
3 FALSE FALSE TRUE NA
4 FALSE NA NA NA
I want to remove columns c1
and c4
, and keep only c2
and c3
.我想删除列c1
和c4
,只保留c2
和c3
。 I know that TRUE
values exist in my original larger data frame (using table(df==TRUE)
), but I don't know which function(s) to use to identify their columns.我知道TRUE
值存在于我原来的较大数据框中(使用table(df==TRUE)
),但我不知道使用哪个函数来标识它们的列。
We can use select
with any
我们可以将select
与any
library(dplyr)
df %>%
select(where(~ is.logical(.x) && any(.x, na.rm = TRUE)))
-output -输出
c2 c3
1 FALSE FALSE
2 TRUE NA
3 FALSE TRUE
4 NA NA
Or in base R
with colSums
on the columns and check if the sum is greater than 1 ( TRUE
-> 1 and FALSE
-> 0)或者在base R
中,列上有colSums
并检查总和是否大于 1( TRUE
-> 1 和FALSE
-> 0)
df[colSums(df, na.rm = TRUE) > 0]
-output -输出
c2 c3
1 FALSE FALSE
2 TRUE NA
3 FALSE TRUE
4 NA NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.