简体   繁体   中英

R - Grouping rows by matching value then adding rows to matching columns in another data frame

I am trying to add values from one data frame (ex2) to an existing data frame (ex1) based on two different columns. As you can see, there is an ID column in both data frames. But in ex2, each column of ex1 is represented by a different row instead of a column. For each matching ID, I want to add the result from ex2$result to the matching row in ex1 under the appropriate column heading (if ex2$alpha[i] = a then ex2$result[i] gets added to ex1$a[z] where ex2$id[i]=ex1$id[z]). Another complication is that not all of the columns from ex1 will have alpha value in ex2, so those should be set as 'NA'.

ex1 <- data.frame(
  id = c(1:20),
  a = c(rep(1,5),rep(0,5),rep(NA,10)),
  b = c(rep(c(1,0),5),rep(NA,10)),
  c = c(rep(c(0,1),5),rep(NA,10)),
  d = c(rep(0,5),rep(1,5),rep(NA,10))
)

ex2 <- data.frame(
  id = c(rep(11,3),rep(12,3),rep(13,3),
         rep(14,2),rep(15,2),
         rep(16,4),rep(17,4),rep(18,4),rep(19,4),rep(20,4)),
  alpha = c(rep(c('a','b','d'),3),rep(c('a','b'),2),
rep(c('a','b','c','d'),5)),
  result = c(rep(c(0,1,1),11))
)

Thanks for your help!

I believe the attached snippet does what you want it to do. But it is hard to know from your toy data if it is feasible to write out the columns a to d in the mutate statement. There surely is a more clever programmatic way to approach this problem.

ex1 <- data.frame(
    id = c(1:20),
    a = c(rep(1,5),rep(0,5),rep(NA,10)),
    b = c(rep(c(1,0),5),rep(NA,10)),
    c = c(rep(c(0,1),5),rep(NA,10)),
    d = c(rep(0,5),rep(1,5),rep(NA,10))
)

ex2 <- data.frame(
    id = c(rep(11,3),rep(12,3),rep(13,3),
          rep(14,2),rep(15,2),
          rep(16,4),rep(17,4),rep(18,4),rep(19,4),rep(20,4)),
    alpha = c(rep(c('a','b','d'),3),rep(c('a','b'),2),
            rep(c('a','b','c','d'),5)),
    result = c(rep(c(0,1,1),11))
)

library(tidyverse)
ex_2_wide <- pivot_wider(ex2, id_cols = id, names_from = alpha, values_from = result  )

joined <- full_join(ex1, ex_2_wide, by = c("id" = "id")) %>%
    mutate(a = coalesce(a.x, a.y)) %>%
    mutate(b = coalesce(b.x, b.y)) %>%
    mutate(c = coalesce(c.x, c.y)) %>%
    mutate(d = coalesce(d.x, d.y)) %>%
    select(-(a.x:c.y))

joined
#>    id a b  c  d
#> 1   1 1 1  0  0
#> 2   2 1 0  1  0
#> 3   3 1 1  0  0
#> 4   4 1 0  1  0
#> 5   5 1 1  0  0
#> 6   6 0 0  1  1
#> 7   7 0 1  0  1
#> 8   8 0 0  1  1
#> 9   9 0 1  0  1
#> 10 10 0 0  1  1
#> 11 11 0 1 NA  1
#> 12 12 0 1 NA  1
#> 13 13 0 1 NA  1
#> 14 14 0 1 NA NA
#> 15 15 1 0 NA NA
#> 16 16 1 1  0  1
#> 17 17 1 0  1  1
#> 18 18 0 1  1  0
#> 19 19 1 1  0  1
#> 20 20 1 0  1  1

Created on 2021-01-07 by the reprex package (v0.3.0)

EDIT:

If we turn the problem around (we first make long tables, followed by join and merge, then pivot back wide), there is only a single step for merger, no matter how many columns you have.

library(tidyverse)

ex1_long <- pivot_longer(ex1, cols = a:d, names_to = "alpha")                                           

joined <- full_join(ex1_long, ex2, by = c("id" = "id", "alpha" = "alpha")) %>%
    mutate(value = coalesce(value, result)) %>% select(-result) %>%
    pivot_wider(id_cols = id, names_from = alpha, values_from = value)

joined
#> # A tibble: 20 x 5
#>       id     a     b     c     d
#>    <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1     1     1     1     0     0
#>  2     2     1     0     1     0
#>  3     3     1     1     0     0
#>  4     4     1     0     1     0
#>  5     5     1     1     0     0
#>  6     6     0     0     1     1
#>  7     7     0     1     0     1
#>  8     8     0     0     1     1
#>  9     9     0     1     0     1
#> 10    10     0     0     1     1
#> 11    11     0     1    NA     1
#> 12    12     0     1    NA     1
#> 13    13     0     1    NA     1
#> 14    14     0     1    NA    NA
#> 15    15     1     0    NA    NA
#> 16    16     1     1     0     1
#> 17    17     1     0     1     1
#> 18    18     0     1     1     0
#> 19    19     1     1     0     1
#> 20    20     1     0     1     1

Created on 2021-01-07 by the reprex package (v0.3.0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM