I am working with a dataset that has information on couples. Person 1 of the couple, identified by its unique ID in column ID1
, forms a couple with Person 2 of the couple, identified by its unique ID in column ID2
. The dataset looks like this:
stack <- cbind(ID1 = c(1, 2, 2, 3, 4, 4, 4, 5, 6),
ID2 = c(4, 3, 3, 2, 1, 1, 1, 6, 5),
what_I_want = c(1, 2, 2, 2, 1, 1, 1, 3, 3))
What I want is simply an enumeration of different couples. You can see what I mean in column what_I_want
. The task is not so easy since I have several rows that are about the same couple (like row 1, 5, 6 and 7 are all about the same couple, couple number 1). On top of that, not all couples will have the same number of rows (like couple 1 will show up in 4 rows, couple 2 in 3 rows etc.). That is why I am struggling with this. I thought about for loops and merging but I can't figure it out how to do it. Any help would be highly appreciated <3
One convenient option is to use igraph
:
grp <- clusters(graph_from_data_frame(df[1:2]))$membership
df$what_I_want <- grp[match(df$ID1, names(grp))]
ID1 ID2 what_I_want
1 1 4 1
2 2 3 2
3 2 3 2
4 3 2 2
5 4 1 1
6 4 1 1
7 4 1 1
8 5 6 3
9 6 5 3
If your IDs are numeric-values, you could use dplyr
:
library(dplyr)
stack %>%
as.data.frame() %>%
mutate(small = pmin(ID1, ID2),
large = pmax(ID1, ID2)) %>%
group_by(small, large) %>%
mutate(number = cur_group_id()) %>%
ungroup() %>%
select(-small, -large)
returns
# A tibble: 9 x 4
ID1 ID2 what_I_want number
<dbl> <dbl> <dbl> <int>
1 1 4 1 1
2 2 3 2 2
3 2 3 2 2
4 3 2 2 2
5 4 1 1 1
6 4 1 1 1
7 4 1 1 1
8 5 6 3 3
9 6 5 3 3
First we sort the IDs by size, so (1,4)
and (4,1)
are both transformed to (1,4)
. Finally, we use these sorted IDs as grouping variable and add a group id.
Here's a base R option -
vec <- with(df, paste(pmin(ID1, ID2), pmax(ID1, ID2)))
df$result <- match(vec, unique(vec))
df
# ID1 ID2 result
#1 1 4 1
#2 2 3 2
#3 2 3 2
#4 3 2 2
#5 4 1 1
#6 4 1 1
#7 4 1 1
#8 5 6 3
#9 6 5 3
An option with igraph
+ stack
+ merge
merge(df,
stack(
membership(
components(
graph_from_data_frame(df)
)
)
),
by.x = "ID1",
by.y = "ind",
all.x = TRUE
)
which gives
ID1 ID2 values
1 1 4 1
2 2 3 2
3 2 3 2
4 3 2 2
5 4 1 1
6 4 1 1
7 4 1 1
8 5 6 3
9 6 5 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.