简体   繁体   中英

Merging corresponding data.frames in separate lists in R

Please, consider the dummy data below:

df1.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_1", 3)))

df1.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_1", 3)))

df1.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_1", 3)))

list1 <- list(df1.1, df1.2, df1.3)

#> list1
#[[1]]
#        var1       var2 state city  ind   sample
#1 0.91851330 0.37539222    MG   BH ind1 sample_1
#2 0.07248773 0.28406666    MG   BH ind1 sample_1
#3 0.66276294 0.09738144    MG   BH ind1 sample_1
#
#[[2]]
#        var1      var2 state city  ind   sample
#1 0.03620023 0.3837086    MG   MC ind2 sample_1
#2 0.81407863 0.4763247    MG   MC ind2 sample_1
#3 0.61538142 0.4526425    MG   MC ind2 sample_1
#
#[[3]]
#       var1      var2 state city  ind   sample
#1 0.1249893 0.0918184    MG   IT ind3 sample_1
#2 0.1323642 0.7891568    MG   IT ind3 sample_1
#3 0.7305105 0.2438753    MG   IT ind3 sample_1
#

df2.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_2", 3)))

df2.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_2", 3)))

df2.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_2", 3)))

list2 <- list(df2.1, df2.2, df2.3)

#> list2
#[[1]]
#        var1      var2 state city  ind   sample
#1 0.01054156 0.3740587    MG   BH ind1 sample_2
#2 0.24489289 0.6290580    MG   BH ind1 sample_2
#3 0.36355003 0.2140268    MG   BH ind1 sample_2
#
#[[2]]
#       var1      var2 state city  ind   sample
#1 0.2904603 0.1390745    MG   MC ind2 sample_2
#2 0.3843579 0.8289106    MG   MC ind2 sample_2
#3 0.4403131 0.6055418    MG   MC ind2 sample_2
#
#[[3]]
#       var1      var2 state city  ind   sample
#1 0.4711878 0.1148234    MG   IT ind3 sample_2
#2 0.4038921 0.3908316    MG   IT ind3 sample_2
#3 0.3886416 0.9038296    MG   IT ind3 sample_2

df3.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_3", 3)))

df3.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_3", 3)))

df3.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_3", 3)))

list3 <- list(df3.1, df3.2, df3.3)

#> list3
#[[1]]
#       var1      var2 state city  ind   sample
#1 0.2672011 0.5336193    MG   BH ind1 sample_3
#2 0.4413970 0.8593835    MG   BH ind1 sample_3
#3 0.3981449 0.6585343    MG   BH ind1 sample_3
#
#[[2]]
#       var1       var2 state city  ind   sample
#1 0.5090785 0.88560620    MG   MC ind2 sample_3
#2 0.1666667 0.08849541    MG   MC ind2 sample_3
#3 0.5226845 0.41225280    MG   MC ind2 sample_3
#
#[[3]]
#       var1      var2 state city  ind   sample
#1 0.7137117 0.3715057    MG   IT ind3 sample_3
#2 0.9605454 0.9443209    MG   IT ind3 sample_3
#3 0.1546365 0.6869942    MG   IT ind3 sample_3

My goal is to unify the three lists in a single one. The data of each individual ( ind ) will be summed up in a single data.frame.

For numeric variables such as var1 and var2 I want the result to be the average value of each line among samples.

For variables like state , city and ind I want the values to be kept (they are the same in every list)

The variable sample will have a different category in each list (sample_1, sample_2, sample_3). I would like to address a new value for this variable in the unified data.frame.

The result I'm aiming for would look like the example below:

#> list_unified
#[[1]]
#       var1      var2 state city  ind  sample
#1 0.4590084 0.4549876    MG   BH ind1 unified
#2 0.1899593 0.4472606    MG   BH ind1 unified
#3 0.7441010 0.1136819    MG   BH ind1 unified
#
#[[2]]
#       var1      var2 state city  ind  sample
#1 0.5445125 0.1096332    MG   MC ind2 unified
#2 0.4039724 0.4898337    MG   MC ind2 unified
#3 0.9519204 0.1769643    MG   MC ind2 unified
#
#[[3]]
#       var1      var2 state city  ind  sample
#1 0.3971165 0.2631346    MG   IT ind3 unified
#2 0.3953296 0.8254704    MG   IT ind3 unified
#3 0.3472372 0.3235779    MG   IT ind3 unified

Any ideas?

My solution requires purrr and dplyr package.

Not sure how inflexible your data is but easiest (albeit inelegant) would be to squash everything into a data.frame. But first, we need to preserve the line information, and my dumb way to do it would be:

list1 <- list1 %>%
  map(~ mutate(.x, rownum = row_number()))
list2 <- list2 %>%
  map(~ mutate(.x, rownum = row_number()))
list3 <- list3 %>%
  map(~ mutate(.x, rownum = row_number()))

Then we simply squash the lists into a dataframe:

df <- dplyr::bind_rows(list1, list2, list3)

And then do dplyr magic:

df1 <- df %>%
  group_by(rownum, ind, state, city) %>%
  summarise(var1 = mean(var1), var2 = mean(var2)) %>%
  mutate(sample = "unified")

And finally using split() function from base to make them into list again:

df1 %>% split(df1$ind)

And you get a list of three individuals...

$ind1
# A tibble: 3 x 7
# Groups:   rownum, ind, state [3]
  rownum ind   state city   var1  var2 sample 
   <int> <chr> <chr> <chr> <dbl> <dbl> <chr>  
1      1 ind1  MG    BH    0.433 0.647 unified
2      2 ind1  MG    BH    0.617 0.253 unified
3      3 ind1  MG    BH    0.316 0.372 unified

$ind2
# A tibble: 3 x 7
# Groups:   rownum, ind, state [3]
  rownum ind   state city   var1  var2 sample 
   <int> <chr> <chr> <chr> <dbl> <dbl> <chr>  
1      1 ind2  MG    MC    0.854 0.500 unified
2      2 ind2  MG    MC    0.274 0.518 unified
3      3 ind2  MG    MC    0.515 0.309 unified

$ind3
# A tibble: 3 x 7
# Groups:   rownum, ind, state [3]
  rownum ind   state city   var1  var2 sample 
   <int> <chr> <chr> <chr> <dbl> <dbl> <chr>  
1      1 ind3  MG    IT    0.259 0.507 unified
2      2 ind3  MG    IT    0.147 0.487 unified
3      3 ind3  MG    IT    0.126 0.562 unified

You can remove rownum before splitting if you didn't like it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM