简体   繁体   中英

R: how to combine the value of a column in two rows together, if these two rows share same character strings in another two columns

I am constructing an edge list for a network. I would like to combine the value of the third column together if the first two columns are the same. The data I have is like this.

ego    alter   weight
A      B       12
B      A       10
C      D       5
D      C       2
E      F       7
F      E       6

The dataset I expect is like this:

ego    alter   weight
A      B       22
C      D       7
E      F       13

Please enlighten me if you have some great ideas to achieve the expected result.

A possible solution:

library(tidyverse)

df %>% 
  rowwise() %>% 
  mutate(aux = sort(c(ego, alter)) %>% str_c(collapse = "")) %>% 
  group_by(aux) %>% 
  summarise(ego, alter, weight = sum(weight), .groups = "drop") %>% 
  filter(!duplicated(aux)) %>% 
  select(-aux)

#> # A tibble: 3 × 3
#>   ego   alter weight
#>   <chr> <chr>  <int>
#> 1 A     B         22
#> 2 C     D          7
#> 3 E     F         13

Or avoiding rowwise :

library(tidyverse)

df %>% 
  mutate(aux = apply(df[1:2], 1, \(x) sort(x) %>% paste0(collapse = ""))) %>% 
  group_by(aux) %>% 
  summarise(ego, alter, weight = sum(weight), .groups = "drop") %>% 
  filter(!duplicated(aux)) %>% 
  select(-aux)

#> # A tibble: 3 × 3
#>   ego   alter weight
#>   <chr> <chr>  <int>
#> 1 A     B         22
#> 2 C     D          7
#> 3 E     F         13

And yet another solution, a bit more succinct:

library(tidyverse)

df %>% 
  group_by(aux = map2_chr(ego, alter, ~ sort(c(.x, .y)) %>% str_c(collapse = ""))) %>% 
  summarise(weight = sum(weight)) %>% 
  extract(aux, c("ego", "alter"), "([[:upper:]])([[:upper:]])")

#> # A tibble: 3 × 3
#>   ego   alter weight
#>   <chr> <chr>  <int>
#> 1 A     B         22
#> 2 C     D          7
#> 3 E     F         13

You could do the following:

f <- function(e,a) sapply(seq_along(e), \(i) paste0(sort(c(e[i],a[i])), collapse=""))

group_by(dt, grp = f(ego,alter)) %>% 
  summarize(weight=sum(weight),.groups="drop") %>% 
  separate(grp,c("ego","alter"),1)

Output:

  ego   alter weight
  <chr> <chr>  <int>
1 A     B         22
2 C     D          7
3 E     F         13

A base R option using pmin/pmax + aggregate

aggregate(
    weight ~ .,
    transform(
        df,
        ego = pmin(ego,alter),
        alter = pmax(ego,alter)
    ),
    sum
)

gives

  ego alter weight
1   A     B     22
2   C     D      7
3   E     F     13

Or, we can use igraph

library(igraph)

df %>%
    graph_from_data_frame(directed = FALSE) %>%
    simplify() %>%
    get.data.frame()

which gives

  from to weight
1    A  B     22
2    C  D      7
3    E  F     13

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM