[英]Remove rows based on first instance to meet a condition
In the following dataset, I want to remove all rows starting at the first instance, sorted by Time
and grouped by ID
, that Var
is TRUE.在以下数据集中,我想删除从第一个实例开始的所有行,按Time
排序并按ID
分组,即Var
为 TRUE。 Put differently, I want to subset all rows for each ID
by those which are FALSE up until the first TRUE, sorted by Time
.换句话说,我想通过在第一个 TRUE 之前为 FALSE 的行对每个ID
的所有行进行子集化,按Time
排序。
ID <- c('A','B','C','A','B','C','A','B','C','A','B','C')
Time <- c(3,3,3,6,6,6,9,9,9,12,12,12)
Var <- c(F,F,F,T,T,F,T,T,F,T,F,T)
data = data.frame(ID, Time, Var)
data
ID Time Var
1 A 3 FALSE
2 B 3 FALSE
3 C 3 FALSE
4 A 6 TRUE
5 B 6 TRUE
6 C 6 FALSE
7 A 9 TRUE
8 B 9 TRUE
9 C 9 FALSE
10 A 12 TRUE
11 B 12 FALSE
12 C 12 TRUE
The desired result for this data frame should be:此数据框的预期结果应该是:
ID Time Var
A 3 FALSE
B 3 FALSE
C 3 FALSE
C 6 FALSE
C 9 FALSE
Note that the solution should not only remove rows where Var
== TRUE, but should also remove rows where Var
== FALSE but this follows (in Time
) another instance where Var
== TRUE for that ID
.请注意,该解决方案不仅应删除Var
== TRUE 的行,还应删除Var
== FALSE 的行,但这会跟随(在Time
)另一个Var
== TRUE 对于该ID
实例。
I've tried many different things but can't seem to figure this out.我尝试了很多不同的东西,但似乎无法弄清楚这一点。 Any help is much appreciated!非常感谢任何帮助!
Here's how to do that with dplyr
using group_by
and cumsum
.以下是使用group_by
和cumsum
使用dplyr
执行此dplyr
方法。
The rationale is that Var is a logical vector where FALSE is equal to 0 and TRUE is equal to 1. cumsum
will remain at 0 until it hits the first TRUE.基本原理是 Var 是一个逻辑向量,其中 FALSE 等于 0,TRUE 等于cumsum
将保持为 0,直到它达到第一个 TRUE。
library(dplyr)
data%>%
group_by(ID)%>%
filter(cumsum(Var)<1)
ID Time Var
<fctr> <dbl> <lgl>
1 A 3 FALSE
2 B 3 FALSE
3 C 3 FALSE
4 C 6 FALSE
5 C 9 FALSE
Here's the equivalent code with data.table
:这是data.table
的等效代码:
library(data.table)
data[data[, .I[cumsum(Var) <1], by = ID]$V1]
ID Time Var
1: A 3 FALSE
2: B 3 FALSE
3: C 3 FALSE
4: C 6 FALSE
5: C 9 FALSE
This data.table
solution should work.这个data.table
解决方案应该可以工作。
library(data.table)
> setDT(data)[, .SD[1:(which.max(Var)-1)], by=ID]
ID Time Var
1: A 3 FALSE
2: B 3 FALSE
3: C 3 FALSE
4: C 6 FALSE
5: C 9 FALSE
Given that you want all the values up to the first TRUE value, which.max
is the way to go.鉴于您希望所有值最多为第一个TRUE 值, which.max
是要走的路。
You can do this with the cumall
verb as well:你也可以用cumall
动词来做到这一点:
library(dplyr)
data %>%
dplyr::group_by(ID) %>%
dplyr::filter(dplyr::cumall(!Var))
ID Time Var
<chr> <dbl> <lgl>
1 A 3 FALSE
2 B 3 FALSE
3 C 3 FALSE
4 C 6 FALSE
5 C 9 FALSE
cumall(!x): all cases until the first TRUE cumall(!x): 直到第一个 TRUE 的所有情况
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.