简体   繁体   中英

subsetting a dataframe in R with two steps

I have the following dataframe:

   name  direction to   
   <chr> <fct>     <chr>
 1 A     ->        B    
 2 A     ->        X    
 3 B     ->        X    
 4 B     ->        Y    
 5 C     ->        B    
 6 C     ->        Y    
 7 S     ->        T    
 8 T     ->        C    
 9 W     ->        Y    
10 X     ->        W    
11 Y     NA        NA  

Step 1. I first want to subset the dataframe to only include values that either have X or Y in the columns name and to .

df %>% dplyr::select(name,direction,to) %>% filter(name %in% c('X','Y') | to %in% c('X','Y'))

  name  direction to   
  <chr> <fct>     <chr>
1 A     ->        X    
2 B     ->        X    
3 B     ->        Y    
4 C     ->        Y    
5 W     ->        Y    
6 X     ->        W    
7 Y     NA        NA  

Step 2. From there, I want to get any other connections that match with any of the unique values in name from df in Step 1. For example, the unique values in name are A,B,C,W,X,Y after Step 1. I want to get all observations in the original dataset (without filtering) where any of these values are in the name column from the original dataset df . In this example, observations 1 (C->B) and 5 (A->B) from the original dataframe would be added to the subset.

Expected output:

  name  direction to   
  <chr> <fct>     <chr>
1 A     ->        X    
2 A     ->        B
3 B     ->        X    
4 B     ->        Y 
5 C     ->        B   
6 C     ->        Y    
7 W     ->        Y    
8 X     ->        W    
9 Y     NA        NA  

Let me know if this doesn't make sense.

I think this should work

df %>% dplyr::select(name,direction,to) %>% filter(name %in% c('X','Y') | to %in% c('X','Y')) -> dfTmp
df[df$name %in% (dfTmp$name),]

We can use if_any to loop over the 'name', 'to' to return a logical vector, subset the 'name' and create a logical vector with %in%

df %>% 
   filter(name %in% name[if_any(c(name, to), ~ . %in% c('X', 'Y' ))])%>%


# A tibble: 9 × 3
  name  direction to   
  <chr> <chr>     <chr>
1 A     ->        B    
2 A     ->        X    
3 B     ->        X    
4 B     ->        Y    
5 C     ->        B    
6 C     ->        Y    
7 W     ->        Y    
8 X     ->        W    
9 Y     <NA>      <NA>

Usually, if_any is used in filter to return rows when either one of the columns looped matches the condition ie here we loop over 'name', 'to', check whether the column have 'X', 'Y' for each row. If one of the column have that value, the row is returned. The if_any returns a logical vector, so use that to subset ( [ ) the 'name' elements and then create the logical vector with %in% on the original 'name' column


df <- structure(list(name = c("A", "A", "B", "B", "C", "C", "S", "T", 
"W", "X", "Y"), direction = c("->", "->", "->", "->", "->", "->", 
"->", "->", "->", "->", NA), to = c("B", "X", "X", "Y", "B", 
"Y", "T", "C", "Y", "W", NA)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM