Merging corresponding data.frames in separate lists in R

Question

Please, consider the dummy data below:

df1.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_1", 3)))

df1.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_1", 3)))

df1.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_1", 3)))

list1 <- list(df1.1, df1.2, df1.3)

#> list1
#[[1]]
#        var1       var2 state city  ind   sample
#1 0.91851330 0.37539222    MG   BH ind1 sample_1
#2 0.07248773 0.28406666    MG   BH ind1 sample_1
#3 0.66276294 0.09738144    MG   BH ind1 sample_1
#
#[[2]]
#        var1      var2 state city  ind   sample
#1 0.03620023 0.3837086    MG   MC ind2 sample_1
#2 0.81407863 0.4763247    MG   MC ind2 sample_1
#3 0.61538142 0.4526425    MG   MC ind2 sample_1
#
#[[3]]
#       var1      var2 state city  ind   sample
#1 0.1249893 0.0918184    MG   IT ind3 sample_1
#2 0.1323642 0.7891568    MG   IT ind3 sample_1
#3 0.7305105 0.2438753    MG   IT ind3 sample_1
#

df2.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_2", 3)))

df2.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_2", 3)))

df2.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_2", 3)))

list2 <- list(df2.1, df2.2, df2.3)

#> list2
#[[1]]
#        var1      var2 state city  ind   sample
#1 0.01054156 0.3740587    MG   BH ind1 sample_2
#2 0.24489289 0.6290580    MG   BH ind1 sample_2
#3 0.36355003 0.2140268    MG   BH ind1 sample_2
#
#[[2]]
#       var1      var2 state city  ind   sample
#1 0.2904603 0.1390745    MG   MC ind2 sample_2
#2 0.3843579 0.8289106    MG   MC ind2 sample_2
#3 0.4403131 0.6055418    MG   MC ind2 sample_2
#
#[[3]]
#       var1      var2 state city  ind   sample
#1 0.4711878 0.1148234    MG   IT ind3 sample_2
#2 0.4038921 0.3908316    MG   IT ind3 sample_2
#3 0.3886416 0.9038296    MG   IT ind3 sample_2

df3.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_3", 3)))

df3.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_3", 3)))

df3.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_3", 3)))

list3 <- list(df3.1, df3.2, df3.3)

#> list3
#[[1]]
#       var1      var2 state city  ind   sample
#1 0.2672011 0.5336193    MG   BH ind1 sample_3
#2 0.4413970 0.8593835    MG   BH ind1 sample_3
#3 0.3981449 0.6585343    MG   BH ind1 sample_3
#
#[[2]]
#       var1       var2 state city  ind   sample
#1 0.5090785 0.88560620    MG   MC ind2 sample_3
#2 0.1666667 0.08849541    MG   MC ind2 sample_3
#3 0.5226845 0.41225280    MG   MC ind2 sample_3
#
#[[3]]
#       var1      var2 state city  ind   sample
#1 0.7137117 0.3715057    MG   IT ind3 sample_3
#2 0.9605454 0.9443209    MG   IT ind3 sample_3
#3 0.1546365 0.6869942    MG   IT ind3 sample_3

My goal is to unify the three lists in a single one. The data of each individual ( ind ) will be summed up in a single data.frame.

For numeric variables such as var1 and var2 I want the result to be the average value of each line among samples.

For variables like state , city and ind I want the values to be kept (they are the same in every list)

The variable sample will have a different category in each list (sample_1, sample_2, sample_3). I would like to address a new value for this variable in the unified data.frame.

The result I'm aiming for would look like the example below:

#> list_unified
#[[1]]
#       var1      var2 state city  ind  sample
#1 0.4590084 0.4549876    MG   BH ind1 unified
#2 0.1899593 0.4472606    MG   BH ind1 unified
#3 0.7441010 0.1136819    MG   BH ind1 unified
#
#[[2]]
#       var1      var2 state city  ind  sample
#1 0.5445125 0.1096332    MG   MC ind2 unified
#2 0.4039724 0.4898337    MG   MC ind2 unified
#3 0.9519204 0.1769643    MG   MC ind2 unified
#
#[[3]]
#       var1      var2 state city  ind  sample
#1 0.3971165 0.2631346    MG   IT ind3 unified
#2 0.3953296 0.8254704    MG   IT ind3 unified
#3 0.3472372 0.3235779    MG   IT ind3 unified

Any ideas?

Answer 1

My solution requires purrr and dplyr package.

Not sure how inflexible your data is but easiest (albeit inelegant) would be to squash everything into a data.frame. But first, we need to preserve the line information, and my dumb way to do it would be:

list1 <- list1 %>%
  map(~ mutate(.x, rownum = row_number()))
list2 <- list2 %>%
  map(~ mutate(.x, rownum = row_number()))
list3 <- list3 %>%
  map(~ mutate(.x, rownum = row_number()))

Then we simply squash the lists into a dataframe:

df <- dplyr::bind_rows(list1, list2, list3)

And then do dplyr magic:

df1 <- df %>%
  group_by(rownum, ind, state, city) %>%
  summarise(var1 = mean(var1), var2 = mean(var2)) %>%
  mutate(sample = "unified")

And finally using split() function from base to make them into list again:

df1 %>% split(df1$ind)

And you get a list of three individuals...

$ind1
# A tibble: 3 x 7
# Groups:   rownum, ind, state [3]
  rownum ind   state city   var1  var2 sample 
   <int> <chr> <chr> <chr> <dbl> <dbl> <chr>  
1      1 ind1  MG    BH    0.433 0.647 unified
2      2 ind1  MG    BH    0.617 0.253 unified
3      3 ind1  MG    BH    0.316 0.372 unified

$ind2
# A tibble: 3 x 7
# Groups:   rownum, ind, state [3]
  rownum ind   state city   var1  var2 sample 
   <int> <chr> <chr> <chr> <dbl> <dbl> <chr>  
1      1 ind2  MG    MC    0.854 0.500 unified
2      2 ind2  MG    MC    0.274 0.518 unified
3      3 ind2  MG    MC    0.515 0.309 unified

$ind3
# A tibble: 3 x 7
# Groups:   rownum, ind, state [3]
  rownum ind   state city   var1  var2 sample 
   <int> <chr> <chr> <chr> <dbl> <dbl> <chr>  
1      1 ind3  MG    IT    0.259 0.507 unified
2      2 ind3  MG    IT    0.147 0.487 unified
3      3 ind3  MG    IT    0.126 0.562 unified

You can remove rownum before splitting if you didn't like it.

Merging corresponding data.frames in separate lists in R

Question

1 answers

solution1
0 2020-08-08 03:54:13

Merging corresponding data.frames in separate lists in R

Question

1 answers

solution1 0 2020-08-08 03:54:13

solution1
0 2020-08-08 03:54:13