根据第一个实例删除行以满足条件

Question

In the following dataset, I want to remove all rows starting at the first instance, sorted by Time and grouped by ID , that Var is TRUE.在以下数据集中，我想删除从第一个实例开始的所有行，按Time排序并按ID分组，即Var为 TRUE。 Put differently, I want to subset all rows for each ID by those which are FALSE up until the first TRUE, sorted by Time .换句话说，我想通过在第一个 TRUE 之前为 FALSE 的行对每个ID的所有行进行子集化，按Time排序。

ID <- c('A','B','C','A','B','C','A','B','C','A','B','C')
Time <- c(3,3,3,6,6,6,9,9,9,12,12,12)
Var <- c(F,F,F,T,T,F,T,T,F,T,F,T)
data = data.frame(ID, Time, Var)

data
   ID Time   Var
1   A    3 FALSE
2   B    3 FALSE
3   C    3 FALSE
4   A    6  TRUE
5   B    6  TRUE
6   C    6 FALSE
7   A    9  TRUE
8   B    9  TRUE
9   C    9 FALSE
10  A   12  TRUE
11  B   12 FALSE
12  C   12  TRUE

The desired result for this data frame should be:此数据框的预期结果应该是：

 ID Time   Var
  A    3 FALSE
  B    3 FALSE
  C    3 FALSE
  C    6 FALSE
  C    9 FALSE

Note that the solution should not only remove rows where Var == TRUE, but should also remove rows where Var == FALSE but this follows (in Time ) another instance where Var == TRUE for that ID .请注意，该解决方案不仅应删除Var == TRUE 的行，还应删除Var == FALSE 的行，但这会跟随（在Time ）另一个Var == TRUE 对于该ID实例。

I've tried many different things but can't seem to figure this out.我尝试了很多不同的东西，但似乎无法弄清楚这一点。 Any help is much appreciated!非常感谢任何帮助！

Answer 1

Here's how to do that with dplyr using group_by and cumsum .以下是使用group_by和cumsum使用dplyr执行此dplyr方法。

The rationale is that Var is a logical vector where FALSE is equal to 0 and TRUE is equal to 1. cumsum will remain at 0 until it hits the first TRUE.基本原理是 Var 是一个逻辑向量，其中 FALSE 等于 0，TRUE 等于cumsum将保持为 0，直到它达到第一个 TRUE。

library(dplyr)
data%>%
  group_by(ID)%>%
  filter(cumsum(Var)<1)

      ID  Time   Var
  <fctr> <dbl> <lgl>
1      A     3 FALSE
2      B     3 FALSE
3      C     3 FALSE
4      C     6 FALSE
5      C     9 FALSE

Here's the equivalent code with data.table :这是data.table的等效代码：

library(data.table)
data[data[, .I[cumsum(Var) <1], by = ID]$V1]
   ID Time   Var
1:  A    3 FALSE
2:  B    3 FALSE
3:  C    3 FALSE
4:  C    6 FALSE
5:  C    9 FALSE

Answer 2

This data.table solution should work.这个data.table解决方案应该可以工作。

library(data.table)
> setDT(data)[, .SD[1:(which.max(Var)-1)], by=ID]
   ID Time   Var
1:  A    3 FALSE
2:  B    3 FALSE
3:  C    3 FALSE
4:  C    6 FALSE
5:  C    9 FALSE

Given that you want all the values up to the first TRUE value, which.max is the way to go.鉴于您希望所有值最多为第一个TRUE 值， which.max是要走的路。

Answer 3

You can do this with the cumall verb as well:你也可以用cumall动词来做到这一点：

library(dplyr)

data %>% 
  dplyr::group_by(ID) %>% 
  dplyr::filter(dplyr::cumall(!Var))

  ID     Time Var  
  <chr> <dbl> <lgl>
1 A         3 FALSE
2 B         3 FALSE
3 C         3 FALSE
4 C         6 FALSE
5 C         9 FALSE

cumall(!x): all cases until the first TRUE cumall(!x): 直到第一个 TRUE 的所有情况

根据第一个实例删除行以满足条件

问题描述

3 个解决方案

解决方案1
2 已采纳 2017-06-15 20:34:25

解决方案2
0 2017-06-15 20:34:07

解决方案3
0 2021-02-08 17:18:52

根据第一个实例删除行以满足条件

问题描述

3 个解决方案

解决方案1 2 已采纳 2017-06-15 20:34:25

解决方案2 0 2017-06-15 20:34:07

解决方案3 0 2021-02-08 17:18:52

解决方案1
2 已采纳 2017-06-15 20:34:25

解决方案2
0 2017-06-15 20:34:07

解决方案3
0 2021-02-08 17:18:52