简体   繁体   English

如何根据多个条件过滤具有 NA 的行

[英]How to filter rows with NA based on multiple conditions

How to drop rows with NA (missing values) based on multiple conditions in group column?如何根据组列中的多个条件删除具有 NA(缺失值)的行?

Here is the dummy data:这是虚拟数据:

a <- data.frame(c('b1', 'b2', 'b3', 'b4'),
                c(NA, 1.5, 0.5, 1),
                c(0.4, NA, 0.3, NA),
                c(0.5, NA, NA,2),
                c(-0.5, -2.5, -0.2,NA),
                c(NA, NA, -0.4,NA),
                c(-0.5, NA, -0.4,NA),
                stringsAsFactors = FALSE)

colnames(a) <- c('id', 'group1_1', 'group1_2', 'group1_3', 'group2_1', 'group2_2', 'group2_3')
rownames(a) <- a$id

a_subset <- a[, 2:7]
a_subset

#    group1_1 group1_2 group1_3 group2_1 group2_2 group2_3
# b1       NA      0.4      0.5     -0.5       NA     -0.5
# b2      1.5       NA       NA     -2.5       NA       NA
# b3      0.5      0.3       NA     -0.2     -0.4     -0.4
# b4      1.0       NA      2.0       NA       NA       NA

As you can see from above dataframe, group_1 and group_2 contains some missing values, and each group has triplicates.从上面的数据框中可以看出,group_1 和 group_2 包含一些缺失值,并且每个组都有一式三份。

Expected output:预期输出:

#    group1_1 group1_2 group1_3 group2_1 group2_2 group2_3
# b1       NA      0.4      0.5     -0.5       NA     -0.5
# b3      0.5      0.3       NA     -0.2     -0.4     -0.4
# b4      1.0       NA      2.0       NA       NA       NA

From above you can see if 1 group contains at least 2 values it will not be removed.从上面您可以看到 1 个组是否包含至少 2 个值,它不会被删除。

Is it possible with dplyr approach? dplyr 方法可以吗?

tidyverse tidyverse


library(tidyverse)

a_subset %>% 
  filter(
    rowSums(!is.na(across(starts_with("group1_")))) >= 2 | 
      rowSums(!is.na(across(starts_with("group2_")))) >= 2)

#>    group1_1 group1_2 group1_3 group2_1 group2_2 group2_3
#> b1       NA      0.4      0.5     -0.5       NA     -0.5
#> b3      0.5      0.3       NA     -0.2     -0.4     -0.4
#> b4      1.0       NA      2.0       NA       NA       NA

data数据

a_subset <- data.frame(
   row.names = c("b1", "b2", "b3", "b4"),
    group1_1 = c(NA, 1.5, 0.5, 1),
    group1_2 = c(0.4, NA, 0.3, NA),
    group1_3 = c(0.5, NA, NA, 2),
    group2_1 = c(-0.5, -2.5, -0.2, NA),
    group2_2 = c(NA, NA, -0.4, NA),
    group2_3 = c(-0.5, NA, -0.4, NA)
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM