简体   繁体   English

通过匹配两列值删除观察

[英]Delete the the observations by matching the two column values

I have the data df .我有数据df I want to delete last observations after matching two column values ie, cate=Yes ~ value=1 .我想after matching two column values删除最后的观察结果,即cate=Yes ~ value=1

df <- data.frame(id=c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,5,5,6,6,6,6,7,7,7,7,7),
       cate=c('No','Yes','Yes','No','Yes','No','Yes','Yes','Yes','No','No','No','Yes','Yes',
 'No','No','Yes','Yes','No',NA,'No','Yes','Yes','Yes','No','Yes','Yes','Yes','Yes'),
                 value=c(0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0))
df
   id cate value
1   1   No     0
2   1  Yes     0
3   1  Yes     0
4   1   No     0
5   1  Yes     0
6   2   No     0
7   2  Yes     1
8   2  Yes     0
9   2  Yes     0
10  2   No     0
11  3   No     0
12  3   No     0
13  3  Yes     0
14  3  Yes     0
15  3   No     0
16  4   No     0
17  4  Yes     0
18  4  Yes     0
19  5   No     0
20  5  Yes     0
21  6   No     0
22  6  Yes     1
23  6  Yes     0
24  6  Yes     0
25  7   No     0
26  7  Yes     1
27  7  Yes     1
28  7  Yes     0
29  7  Yes     0

I want to delete observations per group id after matching cate=Yes and value=1 .我想在匹配cate=Yes and value=1后删除每个组 id 的观察结果。

Then the expected output is那么预期的 output 是

   id cate value
1   1   No     0
2   1  Yes     0
3   1  Yes     0
4   1   No     0
5   1  Yes     0
6   2   No     0
7   2  Yes     1
8   3   No     0
9   3   No     0
10  3  Yes     0
11  3  Yes     0
12  3   No     0
13  4   No     0
14  4  Yes     0
15  4  Yes     0
16  5   No     0
17  5  Yes     0
18  6   No     0
19  6  Yes     1
20  7   No     0
21  7  Yes     1

We could group by 'id', get the cumulative sum of logical expression ( cumsum ), take the cumsum again, then filter the rows where the values are less than 2 (thus it will get the full row for some 'id' that doesn't have any match and the rows till the first match if there are)我们可以按 'id' 分组,得到逻辑表达式的累积和( cumsum ),再次取cumsum ,然后filter值小于 2 的行(因此它会得到一些没有的 'id' 的完整行'没有任何匹配,如果有的话,直到第一个匹配的行)

library(dplyr)
df %>% 
  group_by(id) %>% 
  filter(cumsum(cumsum(cate == 'Yes' & value == 1))<= 1) %>%
  ungroup
  • We can use slice to select indices from 1 to the required row, taking care of NA , so we use coalesce with n() to select all rows which does not meet our condition.我们可以使用slice到 select 索引从 1 到所需行,照顾NA ,所以我们使用coalesce with n() to select 所有不符合我们条件的行。
library(dplyr)

df |> group_by(id) |> 
      slice(1:coalesce(which(cate == "Yes" & value == 1)[1] , n()))
  • Output Output
# A tibble: 21 × 3
# Groups:   id [7]
      id cate  value
   <dbl> <chr> <dbl>
 1     1 No        0
 2     1 Yes       0
 3     1 Yes       0
 4     1 No        0
 5     1 Yes       0
 6     2 No        0
 7     2 Yes       1
 8     3 No        0
 9     3 No        0
10     3 Yes       0
# … with 11 more rows

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 删除同一列中具有匹配单词的行,以及多列中具有匹配值的行 - Delete rows with matching words in the same column, and matching values in multiple columns 在Matrix列中同时匹配两个不同的值 - Matching two different Values simultaniously in a column of Matrix 如何计算与字符向量值匹配的观察值 - How to count observations matching the values of a vector of characters 根据数据集 B 的列中的观察数复制数据集 A,其中两个数据集在 1 列中具有匹配值 - Replicating data set A based on the number of observations in the column of data set B where both data sets have a matching values in 1 column 通过匹配两个数据框中的每一行中的值来填充列值 - Fill column values by matching values in each row in two dataframe 标识两个值之间的观察值的组变量 - Group variable that identifies observations between two values 计算两个值之间的观察数 - calculate number of observations between two values 根据其他两列的值查看观察值 - Viewing Observations Based on the values of two other columns 删除相邻列不等于 100 的匹配观测值 - Removing matching observations where their adjacent column does not equal to 100 比较两个数据框并为单个列打印匹配值的特定行 - Comparing two dataframes and printing specific rows in matching values for a single column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM