简体   繁体   中英

Finding pairs in a table in R

I need to identfy pairs of numbers in a table in R. The structure of the table is as follows:

   user_A   user_B   counter
1       1        2         5
2       1        3         3
3       2        1        10
4       2        4         8

I want to check if for each pair, an opposite one exists, eg pair (1,2) and pair (2,1), and if so - sum the counter value for both pairs. As a result, I want to have an output in a form of a table like this one:

   user_A   user_B   sum   bi_directional
1       1        2    15             TRUE
2       1        3     3            FALSE
3       2        4     8            FALSE

Thank you in advance!

We can sort the first two column row wise with apply (MARGIN=1), cbind with the third column ('d1'), get an index of duplicates of the first two column ('i1'). Convert to 'data.table' ( setDT(d2) ), grouped by 'user_A', and 'user_B', get the sum of 'Counter' and the first row of 'i1'.

 d1 <- setNames(cbind(t(apply(df[1:2], 1, sort)), df[3]), names(df))
  i1 <- duplicated(d1[1:2])|duplicated(d1[1:2], fromLast=TRUE)

 d2 <- cbind(d1, i1)
 library(data.table)
 setDT(d2)[, list(counter=sum(counter), bi_directional=i1[1L]) ,.(user_A, user_B)]
 #  user_A user_B counter bi_directional
 #1:      1      2      15           TRUE
 #2:      1      3       3          FALSE
 #3:      2      4       8          FALSE

Or another option is

setDT(df)[user_A > user_B, c('user_B', 'user_A') := 
       list(user_A, user_B)]
df[, list(counter= sum(counter), bi_directional= .N>1), 
                                   by = .(user_A, user_B)]
#     user_A user_B counter bi_directional
#1:      1      2      15           TRUE
#2:      1      3       3          FALSE
#3:      2      4       8          FALSE

This is a dplyr solution:

df %>%
  mutate(user_A2 = pmin(user_A, user_B),
         user_B = pmax(user_A, user_B),
         user_A = user_A2) %>%
  select(-user_A2) %>%
  group_by(user_A, user_B) %>%
  summarise(sum = sum(counter), bi_directional = n() > 1) %>%
  as.data.frame
##   user_A user_B sum bi_directional
## 1      1      2  15           TRUE
## 2      1      3   3          FALSE
## 3      2      4   8          FALSE

mutate() is used to redefine user_A and user_B such that the smaller of the two values is always in the first column. Then the auxiliary column user_A2 is dropped. The data is grouped by user_A and user_B and two summaries are calculated for each group: The sum over counter and whether there was more than one value. This latter uses n() which counts the number of rows in the group.

The last line uses as.data.frame to transform back to a data frame. This is only needed, if you insist on having a data frame as your result.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM