简体   繁体   中英

data.table efficiently finding common pairs between 2 columns

say I have a dataframe

  subject stim1 stim2 feedback
1      1003    50    51        1
2      1003    48    50        1
3      1003    49    51        1
4      1003    47    49        1
5      1003    47    46        1
6      1003    46    48        1
10     1003    50    48        1
428    1003    48    51        0
433    1003    46    50        0
434    1003    50    49        0
435    1003    54    59        0

I want to create a new column "transitive_pair" by

  1. group by subject (column 1),
  2. For every row in which feedback==0 (starting index 428, otherwise transitive_pair=NaN).
  3. I want to return a boolean which tells me whether there is any chain of pairings (but only those in which feedback==1) that would transitively link stim1 and stim2 values.

Working out a few examples.

row 428- stim1=48 and stim2=51
48 and 51 are not paired but 51 was paired with 50 (e.g.row 1 ) and 50 was paired with 48 (row 10) so transitive_pair[428]=True
    
row 433- stim 1=46 and stim2=50
46 and 48 were paired (row 6) and 48 was paired with 50 (row 2) so transitive_pair[433]=True
            
in row 435, stim1=54, stim2=59
        
there is no chain of pairs that could link them (59 is not paired with anything while feedback==1) so transitive_pair[435]=False 

desired output

   subject stim1 stim2 feedback transitive_pair
1      1003    50    51        1             NaN
2      1003    48    50        1             NaN
3      1003    49    51        1             NaN
4      1003    47    49        1             NaN
5      1003    47    46        1             NaN
6      1003    46    48        1             NaN
10     1003    50    48        1             NaN
428    1003    48    51        0               1
433    1003    46    50        0               1
434    1003    50    49        0               1
435    1003    54    59        0               0

any help would be greatly appreciated!!

and putting a recreateble df here

structure(list(subject = c(1003L, 1003L, 1003L, 1003L, 1003L, 
1003L, 1003L, 1003L, 1003L, 1003L, 1003L), stim1 = c(50L, 48L, 
49L, 47L, 47L, 46L, 50L, 48L, 46L, 50L, 54L), stim2 = c(51L, 
50L, 51L, 49L, 46L, 48L, 48L, 51L, 50L, 49L, 59L), feedback = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L), transitive_pair = c(NaN, 
NaN, NaN, NaN, NaN, NaN, NaN, 1, 1, 1, 0)), row.names = c(1L, 
2L, 3L, 4L, 5L, 6L, 10L, 428L, 433L, 434L, 435L), class = "data.frame")

The columns "stim1" and "stim2" define an undirected graph. Create the graph for feedback == 1 , get its connected components and for each row of the data.frame, check if the values of "stim1" and "stim2" belong to the same component. In the end assign NaN to the rows where feedback is 1.

suppressPackageStartupMessages(library(igraph))

inx <- df1$feedback == 1
g <- graph_from_data_frame(df1[inx, c("stim1", "stim2")], directed = FALSE)
plot(g)


g_comp <- components(g)$membership

df1$transitive_pair_2 <- apply(df1[c("stim1", "stim2")], 1, \(x) {
  i <- names(g_comp) == x[1]
  j <- names(g_comp) == x[2]
  if(any(i) & any(j))
    g_comp[i] == g_comp[j]
  else 0L
})

df1$transitive_pair_2[inx] <- NaN
df1
#>     subject stim1 stim2 feedback transitive_pair transitive_pair_2
#> 1      1003    50    51        1             NaN               NaN
#> 2      1003    48    50        1             NaN               NaN
#> 3      1003    49    51        1             NaN               NaN
#> 4      1003    47    49        1             NaN               NaN
#> 5      1003    47    46        1             NaN               NaN
#> 6      1003    46    48        1             NaN               NaN
#> 10     1003    50    48        1             NaN               NaN
#> 428    1003    48    51        0               1                 1
#> 433    1003    46    50        0               1                 1
#> 434    1003    50    49        0               1                 1
#> 435    1003    54    59        0               0                 0

Created on 2022-07-31 by the reprex package (v2.0.1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM