I'm dealing with the following issue. I would like to sum the count by Date, and unique pair of ID1 and ID2, meaning that AB and BA are ONE pair. However, I want to keep both pairs and their sum in my dataset.
My Dataset looks like this:
Date ID1 ID2 Count
12-1 A B 1
12-1 B A 1
12-1 D E 1
12-1 E D 2
12-2 Y Z 2
12-2 Z Y 3
An expected output looks like this:
Date ID1 ID2 SUM
12-1 A B 2
12-1 B A 2
12-1 D E 3
12-1 E D 3
12-2 Y Z 5
12-2 Z Y 5
My Question can be seen as an extension of this previous question:
R sum observations by unique column PAIRS (BA and AB) and NOT unique combinations (BA or AB)
Many thanks in advance.
Here is a way.
First, create a vector of sorted values in the columns ID1
and ID2
, and paste them together. Then group with ave
. Finally, remove the vector of unique values.
df1$unique <- apply(df1[c("ID1", "ID2")], 1, \(x) paste(sort(x), collapse = ""))
df1$Sum <- with(df1, ave(Count, unique, FUN = sum))
df1$unique <- NULL
df1
# Date ID1 ID2 Count Sum
#1 12-1 A B 1 2
#2 12-1 B A 1 2
#3 12-1 D E 1 3
#4 12-1 E D 2 3
#5 12-2 Y Z 2 5
#6 12-2 Z Y 3 5
This may also be done with pmin/pmax
to create a grouping column
library(dplyr)
library(stringr)
df1 %>%
group_by(Date, grp = str_c(pmin(ID1, ID2), pmax(ID1, ID2))) %>%
mutate(Sum = sum(Count)) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 6 × 5
Date ID1 ID2 Count Sum
<chr> <chr> <chr> <int> <int>
1 12-1 A B 1 2
2 12-1 B A 1 2
3 12-1 D E 1 3
4 12-1 E D 2 3
5 12-2 Y Z 2 5
6 12-2 Z Y 3 5
df1 <- structure(list(Date = c("12-1", "12-1", "12-1", "12-1", "12-2",
"12-2"), ID1 = c("A", "B", "D", "E", "Y", "Z"), ID2 = c("B",
"A", "E", "D", "Z", "Y"), Count = c(1L, 1L, 1L, 2L, 2L, 3L)),
class = "data.frame", row.names = c(NA,
-6L))
Here is a dplyr
solution making use of lapply
:
In essence we create a new column y
that orders the characters in alphabetically order, so that we can group also for this column:
library(dplyr)
library(stringr)
df %>%
mutate(x = paste(ID1, ID2)) %>%
mutate(y = str_split(x, ' ') %>% lapply(., 'sort') %>% lapply(., 'paste', collapse=' ')) %>%
group_by(Date, y) %>%
mutate(SUM = sum(Count)) %>%
ungroup() %>%
select(-c(x, y, Count))
Date ID1 ID2 SUM
<chr> <chr> <chr> <int>
1 12-1 A B 2
2 12-1 B A 2
3 12-1 D E 3
4 12-1 E D 3
5 12-2 Y Z 5
6 12-2 Z Y 5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.