[英]Delete the the observations by matching the two column values
I have the data df
.我有数据df
。 I want to delete last observations after matching two column values
ie, cate=Yes ~ value=1
.我想after matching two column values
删除最后的观察结果,即cate=Yes ~ value=1
。
df <- data.frame(id=c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,5,5,6,6,6,6,7,7,7,7,7),
cate=c('No','Yes','Yes','No','Yes','No','Yes','Yes','Yes','No','No','No','Yes','Yes',
'No','No','Yes','Yes','No',NA,'No','Yes','Yes','Yes','No','Yes','Yes','Yes','Yes'),
value=c(0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0))
df
id cate value
1 1 No 0
2 1 Yes 0
3 1 Yes 0
4 1 No 0
5 1 Yes 0
6 2 No 0
7 2 Yes 1
8 2 Yes 0
9 2 Yes 0
10 2 No 0
11 3 No 0
12 3 No 0
13 3 Yes 0
14 3 Yes 0
15 3 No 0
16 4 No 0
17 4 Yes 0
18 4 Yes 0
19 5 No 0
20 5 Yes 0
21 6 No 0
22 6 Yes 1
23 6 Yes 0
24 6 Yes 0
25 7 No 0
26 7 Yes 1
27 7 Yes 1
28 7 Yes 0
29 7 Yes 0
I want to delete observations per group id after matching cate=Yes and value=1
.我想在匹配cate=Yes and value=1
后删除每个组 id 的观察结果。
Then the expected output is那么预期的 output 是
id cate value
1 1 No 0
2 1 Yes 0
3 1 Yes 0
4 1 No 0
5 1 Yes 0
6 2 No 0
7 2 Yes 1
8 3 No 0
9 3 No 0
10 3 Yes 0
11 3 Yes 0
12 3 No 0
13 4 No 0
14 4 Yes 0
15 4 Yes 0
16 5 No 0
17 5 Yes 0
18 6 No 0
19 6 Yes 1
20 7 No 0
21 7 Yes 1
We could group by 'id', get the cumulative sum of logical expression ( cumsum
), take the cumsum
again, then filter
the rows where the values are less than 2 (thus it will get the full row for some 'id' that doesn't have any match and the rows till the first match if there are)我们可以按 'id' 分组,得到逻辑表达式的累积和( cumsum
),再次取cumsum
,然后filter
值小于 2 的行(因此它会得到一些没有的 'id' 的完整行'没有任何匹配,如果有的话,直到第一个匹配的行)
library(dplyr)
df %>%
group_by(id) %>%
filter(cumsum(cumsum(cate == 'Yes' & value == 1))<= 1) %>%
ungroup
slice
to select indices from 1 to the required row, taking care of NA
, so we use coalesce
with n()
to select all rows which does not meet our condition.我们可以使用slice
到 select 索引从 1 到所需行,照顾NA
,所以我们使用coalesce
with n()
to select 所有不符合我们条件的行。library(dplyr)
df |> group_by(id) |>
slice(1:coalesce(which(cate == "Yes" & value == 1)[1] , n()))
# A tibble: 21 × 3
# Groups: id [7]
id cate value
<dbl> <chr> <dbl>
1 1 No 0
2 1 Yes 0
3 1 Yes 0
4 1 No 0
5 1 Yes 0
6 2 No 0
7 2 Yes 1
8 3 No 0
9 3 No 0
10 3 Yes 0
# … with 11 more rows
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.