[英]Ignore NA values in filtering with dplyr
我有一个数据框,例如:
> tab
Groups Species Value
1 Group1 Sp1 1
2 Group1 Sp1 4
3 Group1 Sp2 78
4 Group1 Sp3 NA
5 Group1 Sp4 NA
6 Group2 Sp2 3
7 Group2 Sp3 9
8 Group2 Sp4 8
9 Group3 Sp1 9
10 Group3 Sp3 10
11 Group3 Sp3 110
12 Group3 Sp3 14
我正在尝试将值小于80的组保留
我试过了:
tab %>%
group_by(Groups) %>%
filter(all(Value < 80))
但我不知道如何忽略过滤器中的NA values
。
在这里我应该得到:
> tab
Groups Species Value
1 Group1 Sp1 1
2 Group1 Sp1 4
3 Group1 Sp2 78
4 Group1 Sp3 NA
5 Group1 Sp4 NA
6 Group2 Sp2 3
7 Group2 Sp3 9
8 Group2 Sp4 8
有人有解决方案吗? 谢谢
如果我还有:
> tab
Groups Species Value sp mrca
1 Group1 Sp1 1 3 3
2 Group1 Sp1 4 3 3
3 Group1 Sp2 78 NA NA
4 Group1 Sp3 NA 3 12
5 Group1 Sp4 NA 3 3
6 Group2 Sp2 3 2 3
7 Group2 Sp3 9 2 40
8 Group2 Sp4 8 NA NA
9 Group3 Sp1 9 2 2
10 Group3 Sp3 10 3 3
11 Group3 Sp3 110 3 2
12 Group3 Sp3 14 2 3
我想过滤所有具有<80个值的组,并且其中sp-mrca = 0:9
我试过机智的答案:
tab %>%
group_by(Groups) %>%
filter(all(Value < 80 |is.na(Value))) %>%
filter((all(abs(sp - mrca) %in% 0:9)|is.na(sp) & is.na(mrca)))
但这似乎不是正确的代码
我应该得到:
> tab
Groups Species Value sp mrca
1 Group1 Sp1 1 3 3
2 Group1 Sp1 4 3 3
3 Group1 Sp2 78 NA NA
4 Group1 Sp3 NA 3 12
5 Group1 Sp4 NA 3 3
我们可以使用和|
与is.na
tab %>%
group_by(Groups) %>%
filter(all(Value < 80 |is.na(Value)))
# A tibble: 8 x 3
# Groups: Groups [2]
# Groups Species Value
# <chr> <chr> <int>
#1 Group1 Sp1 1
#2 Group1 Sp1 4
#3 Group1 Sp2 78
#4 Group1 Sp3 NA
#5 Group1 Sp4 NA
#6 Group2 Sp2 3
#7 Group2 Sp3 9
#8 Group2 Sp4 8
OP的代码中的问题是,当我们用Value < 80
包裹all
Value < 80
,比较将为NA
的那些值返回NA
,现在all
也返回NA
而不是逻辑TRUE / FALSE,并且在filter
,它会自动删除NA
默认
为了更好地理解,请检查
tab %>%
group_by(Groups) %>%
mutate(ind = all(Value < 80))
和这里的区别
tab %>%
group_by(Groups) %>%
mutate(ind = all(Value < 80| is.na(Value)))
或使用data.table
library(data.table)
setDT(tab)[, .SD[all(Value < 80 | is.na(Value))], Groups]
或使用base R
tab[with(tab, ave(Value < 80 | is.na(Value), Groups, FUN = all)),]
对于第二个数据集,
tab1 %>%
group_by(Groups) %>%
filter(all(Value < 80 |is.na(Value)),
all(na.omit(abs(sp-mrca)) %in% 0:9))
tab <- structure(list(Groups = c("Group1", "Group1", "Group1", "Group1",
"Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group3",
"Group3"), Species = c("Sp1", "Sp1", "Sp2", "Sp3", "Sp4", "Sp2",
"Sp3", "Sp4", "Sp1", "Sp3", "Sp3", "Sp3"), Value = c(1L, 4L,
78L, NA, NA, 3L, 9L, 8L, 9L, 10L, 110L, 14L)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
tab1 <- structure(list(Groups = c("Group1", "Group1", "Group1", "Group1",
"Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group3",
"Group3"), Species = c("Sp1", "Sp1", "Sp2", "Sp3", "Sp4", "Sp2",
"Sp3", "Sp4", "Sp1", "Sp3", "Sp3", "Sp3"), Value = c(1L, 4L,
78L, NA, NA, 3L, 9L, 8L, 9L, 10L, 110L, 14L), sp = c(3L, 3L,
NA, 3L, 3L, 2L, 2L, NA, 2L, 3L, 3L, 2L), mrca = c(3L, 3L, NA,
12L, 3L, 3L, 40L, NA, 2L, 3L, 2L, 3L)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
我们可以将base R中的ave
与subset
。 从数据中删除NA
行,并找到all
值均小于80的组,并从原始tab
subset
其subset
subset(tab, Groups %in% unique(with(na.omit(tab),
Groups[ave(Value < 80, Groups, FUN = all)])))
# Groups Species Value
#1 Group1 Sp1 1
#2 Group1 Sp1 4
#3 Group1 Sp2 78
#4 Group1 Sp3 NA
#5 Group1 Sp4 NA
#6 Group2 Sp2 3
#7 Group2 Sp3 9
#8 Group2 Sp4 8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.