简体   繁体   English

根据第一个实例删除行以满足条件

[英]Remove rows based on first instance to meet a condition

In the following dataset, I want to remove all rows starting at the first instance, sorted by Time and grouped by ID , that Var is TRUE.在以下数据集中,我想删除从第一个实例开始的所有行,按Time排序并按ID分组,即Var为 TRUE。 Put differently, I want to subset all rows for each ID by those which are FALSE up until the first TRUE, sorted by Time .换句话说,我想通过在第一个 TRUE 之前为 FALSE 的行对每个ID的所有行进行子集化,按Time排序。

ID <- c('A','B','C','A','B','C','A','B','C','A','B','C')
Time <- c(3,3,3,6,6,6,9,9,9,12,12,12)
Var <- c(F,F,F,T,T,F,T,T,F,T,F,T)
data = data.frame(ID, Time, Var)

data
   ID Time   Var
1   A    3 FALSE
2   B    3 FALSE
3   C    3 FALSE
4   A    6  TRUE
5   B    6  TRUE
6   C    6 FALSE
7   A    9  TRUE
8   B    9  TRUE
9   C    9 FALSE
10  A   12  TRUE
11  B   12 FALSE
12  C   12  TRUE

The desired result for this data frame should be:此数据框的预期结果应该是:

 ID Time   Var
  A    3 FALSE
  B    3 FALSE
  C    3 FALSE
  C    6 FALSE
  C    9 FALSE

Note that the solution should not only remove rows where Var == TRUE, but should also remove rows where Var == FALSE but this follows (in Time ) another instance where Var == TRUE for that ID .请注意,该解决方案不仅应删除Var == TRUE 的行,还应删除Var == FALSE 的行,但这会跟随(在Time )另一个Var == TRUE 对于该ID实例。

I've tried many different things but can't seem to figure this out.我尝试了很多不同的东西,但似乎无法弄清楚这一点。 Any help is much appreciated!非常感谢任何帮助!

Here's how to do that with dplyr using group_by and cumsum .以下是使用group_bycumsum使用dplyr执行此dplyr方法。

The rationale is that Var is a logical vector where FALSE is equal to 0 and TRUE is equal to 1. cumsum will remain at 0 until it hits the first TRUE.基本原理是 Var 是一个逻辑向量,其中 FALSE 等于 0,TRUE 等于cumsum将保持为 0,直到它达到第一个 TRUE。

library(dplyr)
data%>%
  group_by(ID)%>%
  filter(cumsum(Var)<1)

      ID  Time   Var
  <fctr> <dbl> <lgl>
1      A     3 FALSE
2      B     3 FALSE
3      C     3 FALSE
4      C     6 FALSE
5      C     9 FALSE

Here's the equivalent code with data.table :这是data.table的等效代码:

library(data.table)
data[data[, .I[cumsum(Var) <1], by = ID]$V1]
   ID Time   Var
1:  A    3 FALSE
2:  B    3 FALSE
3:  C    3 FALSE
4:  C    6 FALSE
5:  C    9 FALSE

This data.table solution should work.这个data.table解决方案应该可以工作。

library(data.table)
> setDT(data)[, .SD[1:(which.max(Var)-1)], by=ID]
   ID Time   Var
1:  A    3 FALSE
2:  B    3 FALSE
3:  C    3 FALSE
4:  C    6 FALSE
5:  C    9 FALSE

Given that you want all the values up to the first TRUE value, which.max is the way to go.鉴于您希望所有值最多为第一个TRUE 值, which.max是要走的路。

You can do this with the cumall verb as well:你也可以用cumall动词来做到这一点:

library(dplyr)

data %>% 
  dplyr::group_by(ID) %>% 
  dplyr::filter(dplyr::cumall(!Var))

  ID     Time Var  
  <chr> <dbl> <lgl>
1 A         3 FALSE
2 B         3 FALSE
3 C         3 FALSE
4 C         6 FALSE
5 C         9 FALSE

cumall(!x): all cases until the first TRUE cumall(!x): 直到第一个 TRUE 的所有情况

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM