简体   繁体   中英

Retain records where condition is met across two rows and two columns

I have a dataset similar to this:

df <- 
  read.table(textConnection("ID   Column1   Column2
A 0 1
A 1 0
A 1 0
A 1 0
A 0 1
A 1 0
A 0 1
A 0 0 
A 1 0 
A 1 0
B 0 1
B 1 0 
C 0 1
C 0 0
C 1 0"), header=TRUE)

I am looking to do a group_by ID in dplyr that maintains records where Column2 = '1' and the record underneath it has Column1 = '1'. This may happen more than once per ID; all other records should be excluded. So the output from the above should be:

ID Column1 Column2
A 0 1
A 1 0
A 0 1
A 1 0
B 0 1
B 1 0

Any help will be very much appreciated, thanks!

You could use lag and lead :

library(dplyr)

df %>%
  group_by(ID) %>%
  filter(lead(Column1) == 1 &     Column2  == 1 |
              Column1  == 1 & lag(Column2) == 1) %>%
  ungroup()

# # A tibble: 6 × 3
#   ID    Column1 Column2
#   <chr>   <int>   <int>
# 1 A           0       1
# 2 A           1       0
# 3 A           0       1
# 4 A           1       0
# 5 B           0       1
# 6 B           1       0

Here is an alternative approach:

library(dplyr)

df %>% 
  group_by(ID, x = rep(row_number(), each=2, length.out = n())) %>% 
  filter(sum(Column1)>=1 & sum(Column2)>=1) %>% 
  ungroup() %>% 
  select(-x)

  ID    Column1 Column2
  <chr>   <int>   <int>
1 A           0       1
2 A           1       0
3 A           0       1
4 A           1       0
5 B           0       1
6 B           1       0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM