[英]Remove columns from a dataframe based on number of rows with valid values
I have a dataframe: 我有一个数据框:
df = data.frame(gene = c("a", "b", "c", "d", "e"),
value1 = c(NA, NA, NA, 2, 1),
value2 = c(NA, 1, 2, 3, 4),
value3 = c(NA, NA, NA, NA, 1))
I would like to keep all those columns (plus the first, gene) with more than or equal to atleast 2 valid values (ie, not NA). 我想使所有这些列(加上第一列,基因)都具有大于或等于至少2个有效值(即,不是NA)。 How do I do this?
我该怎么做呢?
I am thinking something like this ... 我在想这样的事情...
df1 = df %>% select_if(function(.) ...)
Thanks 谢谢
We can sum
the non-NA elements and create a logical condition to select
the columns of interest 我们可以对非NA元素
sum
并创建逻辑条件以select
感兴趣的列
library(dplyr)
df1 <- df %>%
select_if(~ sum(!is.na(.)) > 2)
df1
# gene value2
#1 a NA
#2 b 1
#3 c 2
#4 d 3
#5 e 4
Or another option is keep
或者另一个选择是
keep
library(purrr)
keep(df, ~ sum(!is.na(.x)) > 2)
Or create the condition based on the number of rows 或根据行数创建条件
df %>%
select_if(~ mean(!is.na(.)) > 0.5)
Or use Filter
from base R
或使用
base R
Filter
Filter(function(x) sum(!is.na(x)) > 2, df)
We can use colSums
in base R to count the non-NA value per column 我们可以在基数R中使用
colSums
来计算每列的非NA值
df[colSums(!is.na(df)) > 2]
# gene value2
#1 a NA
#2 b 1
#3 c 2
#4 d 3
#5 e 4
Or using apply
或使用
apply
df[apply(!is.na(df), 2, sum) > 2]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.