简体   繁体   中英

R subset rows of same value in one column dependent on multiple values in another column

I am new to R and there may be a simple solution to this but I'm struggling to find one. I wish to subset a data frame to exclude all rows that don't have both values offered in another row. So, let's say this is my data frame: df1

v1 v2 v3
A 1 x
A 2 y
A 3 x
B 4 x
C 5 y
C 6 y
D 7 y
D 8 x

I wish to eliminate any rows that do NOT have both an x and y value (v3) for a corresponding letter (v1) while also keeping all other columns intact (v2) so my final result would be:

v1 v2 v3
A 1 x
A 2 y
A 3 x
D 7 y
D 8 x

Only values A and D would be retained because they have both a corresponding x and a corresponding y value. B and C would be eliminated since they only have either x OR y but not both.

I've tried using group_by and filter. The result comes out as an empty data frame:

library(dplyr)
df2 <- df1 %>% 
     group_by(v1) %>%
     filter(all(c('x', 'y') %in% v3))

as well as:

library(dplyr)
df2 <- df1 %>% 
   group_by(v1) %>% 
   filter(any(v3 == "x"),
          any(v3 == "y"))
df1 %>%
   group_by(v1) %>%
   filter(all(unique(df1$v3) %in% v3))

# A tibble: 5 x 3
# Groups:   v1 [2]
  v1       v2 v3   
  <chr> <int> <chr>
1 A         1 x    
2 A         2 y    
3 A         3 x    
4 D         7 y    
5 D         8 x  

Try this aggregate solution

df1[df1$v1 %in% names( which( table( 
  aggregate( . ~ v3 + v1, df1, c )[,"v1"] ) > 1 )),]

  v1 v2 v3
1  A  1  x
2  A  2  y
3  A  3  x
7  D  7  y
8  D  8  x

Data

df1 <- structure(list(v1 = c("A", "A", "A", "B", "C", "C", "D", "D"),
    v2 = 1:8, v3 = c("x", "y", "x", "x", "y", "y", "y", "x")), class = "data.frame", row.names = c(NA,
-8L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM