[英]R dplyr apply a function based on a condition after group_by
[英]Apply a custom function after group_by using dplyr in R
如何在group_by之后使用dplyr删除具有2个或更多连续NA的组? 我编写了一个输出True或False的函数,无论数据框中的列是否有2个或更多NAs:
# function for determining if ts contains consecutive NAs
is.na.contiguous <- function(df, consecutive) {
na.rle <- rle(is.na(df$b))
na.rle$values <- na.rle$values & na.rle$lengths >= consecutive
any(na.rle$values)
}
# example df
d = structure(list(a = c(1, 2, 3, 4, 5, 6, 7, 8), b = c(1, 2, 2,
+ NA, NA, 2, NA, 2), c = c(1, 1, 1, 2, 2, 2, 3, 3)), class = "data.frame", row.names = c(NA,
+ -8L))
head(d)
a b c
1 1 1 1
2 2 2 1
3 3 2 1
4 4 NA 2
5 5 NA 2
6 6 2 2
7 7 NA 3
8 8 2 3
# test function
is.na.contiguous(d,2)
TRUE # column b has 2 consecutive NAs
is.na.contiguous(d,3)
FALSE # column b does not have 3 consecutive NAs
现在,如何将此功能应用于数据框中的每个组? 以下是我的尝试:
d %>% group_by(c) %>% mutate(consecNA = is.na.contiguous(.,2)) %>% as.data.frame()
a b c consecNA
1 1 1 1 TRUE
2 2 2 1 TRUE
3 3 2 1 TRUE
4 4 NA 2 TRUE
5 5 NA 2 TRUE
6 6 2 2 TRUE
7 7 NA 3 TRUE
8 8 2 3 TRUE
我究竟做错了什么?
不是将整个数据帧传递给is.na.contiguous
,而是仅传递列值,然后通过组应用它会很简单,如果您想对某些不同的列执行相同操作,它也会变得灵活。
is.na.contiguous <- function(x, consecutive) {
na.rle <- rle(is.na(x))
na.rle$values <- na.rle$values & na.rle$lengths >= consecutive
any(na.rle$values)
}
library(dplyr)
d %>%
group_by(c) %>%
filter(!is.na.contiguous(b, 2))
# a b c
# <dbl> <dbl> <dbl>
#1 1 1 1
#2 2 2 1
#3 3 2 1
#4 7 NA 3
#5 8 2 3
一个选项是使用rleid
从data.table
在逻辑矢量( is.na(b)
),并使用该子集到具有行大于或等于2的数的基团,并且如果all
的元素是NA
library(data.table)
i1 <- setDT(d)[, .I[!(.N >=2 & all(is.na(b)))], rleid(is.na(b))]$V1
d[i1]
#. a b c
#1: 1 1 1
#2: 2 2 1
#3: 3 2 1
#4: 6 2 2
#5: 7 NA 3
#6: 8 2 3
或者,如果我们还需要按'c'分组
setDT(d)[d[, .I[sum(is.na(b)) <2], .(grp = rleid(is.na(b)), c)]$V1]
或者与tidyverse
library(dplyr)
d %>%
group_by(grp = rleid(is.na(b))) %>%
filter(!(n() >=2 & all(is.na(b))))
# A tibble: 6 x 4
# Groups: grp [4]
# a b c grp
# <dbl> <dbl> <dbl> <int>
#1 1 1 1 1
#2 2 2 1 1
#3 3 2 1 1
#4 6 2 2 3
#5 7 NA 3 4
#6 8 2 3 5
或者另一种选择是获得逻辑向量的sum
并检查它是否小于2
d %>%
group_by(c, grp = rleid(is.na(b))) %>%
filter(sum(is.na(b))<2)
如果我们使用OP的功能
is.na.contiguous <- function(x, consecutive) {
na.rle <- rle(is.na(x))
with(na.rle, any(values & na.rle$lengths >= consecutive))
}
d %>%
group_by(c) %>%
mutate(consecNA = is.na.contiguous(b, 2))
# A tibble: 8 x 4
# Groups: c [3]
# a b c consecNA
# <dbl> <dbl> <dbl> <lgl>
#1 1 1 1 FALSE
#2 2 2 1 FALSE
#3 3 2 1 FALSE
#4 4 NA 2 TRUE
#5 5 NA 2 TRUE
#6 6 2 2 TRUE
#7 7 NA 3 FALSE
#8 8 2 3 FALSE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.