简体   繁体   English

根据具有有效值的行数从数据框中删除列

[英]Remove columns from a dataframe based on number of rows with valid values

I have a dataframe: 我有一个数据框:

df = data.frame(gene = c("a", "b", "c", "d", "e"),
                value1 = c(NA, NA, NA, 2, 1),
                value2 = c(NA, 1, 2, 3, 4),
                value3 = c(NA, NA, NA, NA, 1))

I would like to keep all those columns (plus the first, gene) with more than or equal to atleast 2 valid values (ie, not NA). 我想使所有这些列(加上第一列,基因)都具有大于或等于至少2个有效值(即,不是NA)。 How do I do this? 我该怎么做呢?

I am thinking something like this ... 我在想这样的事情...

df1 = df %>% select_if(function(.) ...)

Thanks 谢谢

We can sum the non-NA elements and create a logical condition to select the columns of interest 我们可以对非NA元素sum并创建逻辑条件以select感兴趣的列

library(dplyr)
df1 <- df %>%
          select_if(~ sum(!is.na(.)) > 2)
df1
#   gene value2
#1    a     NA
#2    b      1
#3    c      2
#4    d      3
#5    e      4

Or another option is keep 或者另一个选择是keep

library(purrr)
keep(df, ~ sum(!is.na(.x)) > 2)

Or create the condition based on the number of rows 或根据行数创建条件

df %>%
   select_if(~ mean(!is.na(.)) > 0.5)

Or use Filter from base R 或使用base R Filter

Filter(function(x) sum(!is.na(x)) > 2, df)

We can use colSums in base R to count the non-NA value per column 我们可以在基数R中使用colSums来计算每列的非NA值

df[colSums(!is.na(df)) > 2]

#  gene value2
#1    a     NA
#2    b      1
#3    c      2
#4    d      3
#5    e      4

Or using apply 或使用apply

df[apply(!is.na(df), 2, sum) > 2]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据列值删除行 - Remove rows based on columns values 根据行号将列/行从一个数据帧映射到另一个数据帧 - Mapping columns/rows from one dataframe to another based on row number 如何根据另一个向量中的值删除 R 中数据帧中的列? - How to remove columns in a dataframe in R based on the values from another vector? 根据三列删除数据框中的行 - Remove rows in dataframe based on three columns 根据值数&gt; 0从R中的数据框中删除列 - Deleting columns from a dataframe in R based on number of values >0 如何根据R中有效列数(NA除外)选择数据框中的某些列? - How can I select certain columns in a dataframe based on their number of valid values (except NA) in R? R:根据多列中的值从数据框中删除行 - R: Remove rows from data frame based on values in several columns 如何根据列是否作为第二个数据框中的行存在而从数据框中删除列? - How to remove columns from dataframe based on whether or not they exist as rows in a 2nd dataframe? 在R中,根据行和列从另一个数据框中选择值,这些值保存在数据框中 - In R, select values from another dataframe based on rows and columns, which are saved in dataframe 如何根据另一个数据帧的值删除数据框中的行 - How to remove rows in a dataframe based on values of another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM