简体   繁体   中英

R: Enumerating Matches across 2 Vectors

I am working with a dataset that has information on couples. Person 1 of the couple, identified by its unique ID in column ID1 , forms a couple with Person 2 of the couple, identified by its unique ID in column ID2 . The dataset looks like this:

stack <- cbind(ID1 =         c(1, 2, 2, 3, 4, 4, 4, 5, 6), 
               ID2 =         c(4, 3, 3, 2, 1, 1, 1, 6, 5),
               what_I_want = c(1, 2, 2, 2, 1, 1, 1, 3, 3))

What I want is simply an enumeration of different couples. You can see what I mean in column what_I_want . The task is not so easy since I have several rows that are about the same couple (like row 1, 5, 6 and 7 are all about the same couple, couple number 1). On top of that, not all couples will have the same number of rows (like couple 1 will show up in 4 rows, couple 2 in 3 rows etc.). That is why I am struggling with this. I thought about for loops and merging but I can't figure it out how to do it. Any help would be highly appreciated <3

One convenient option is to use igraph :

grp <- clusters(graph_from_data_frame(df[1:2]))$membership
df$what_I_want <- grp[match(df$ID1, names(grp))]

  ID1 ID2 what_I_want
1   1   4           1
2   2   3           2
3   2   3           2
4   3   2           2
5   4   1           1
6   4   1           1
7   4   1           1
8   5   6           3
9   6   5           3

If your IDs are numeric-values, you could use dplyr :

library(dplyr)

stack %>%
  as.data.frame() %>%
  mutate(small = pmin(ID1, ID2),
         large = pmax(ID1, ID2)) %>%
  group_by(small, large) %>%
  mutate(number = cur_group_id()) %>%
  ungroup() %>%
  select(-small, -large)

returns

# A tibble: 9 x 4
    ID1   ID2 what_I_want number
  <dbl> <dbl>       <dbl>  <int>
1     1     4           1      1
2     2     3           2      2
3     2     3           2      2
4     3     2           2      2
5     4     1           1      1
6     4     1           1      1
7     4     1           1      1
8     5     6           3      3
9     6     5           3      3

First we sort the IDs by size, so (1,4) and (4,1) are both transformed to (1,4) . Finally, we use these sorted IDs as grouping variable and add a group id.

Here's a base R option -

vec <- with(df, paste(pmin(ID1, ID2), pmax(ID1, ID2)))
df$result <- match(vec, unique(vec))
df

#  ID1 ID2 result
#1   1   4      1
#2   2   3      2
#3   2   3      2
#4   3   2      2
#5   4   1      1
#6   4   1      1
#7   4   1      1
#8   5   6      3
#9   6   5      3

An option with igraph + stack + merge

merge(df,
  stack(
    membership(
      components(
        graph_from_data_frame(df)
      )
    )
  ),
  by.x = "ID1",
  by.y = "ind",
  all.x = TRUE
)

which gives

  ID1 ID2 values
1   1   4      1
2   2   3      2
3   2   3      2
4   3   2      2
5   4   1      1
6   4   1      1
7   4   1      1
8   5   6      3
9   6   5      3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM