简体   繁体   English

在 R 的单独列表中合并相应的 data.frames

[英]Merging corresponding data.frames in separate lists in R

Please, consider the dummy data below:请考虑以下虚拟数据:

df1.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_1", 3)))

df1.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_1", 3)))

df1.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_1", 3)))

list1 <- list(df1.1, df1.2, df1.3)

#> list1
#[[1]]
#        var1       var2 state city  ind   sample
#1 0.91851330 0.37539222    MG   BH ind1 sample_1
#2 0.07248773 0.28406666    MG   BH ind1 sample_1
#3 0.66276294 0.09738144    MG   BH ind1 sample_1
#
#[[2]]
#        var1      var2 state city  ind   sample
#1 0.03620023 0.3837086    MG   MC ind2 sample_1
#2 0.81407863 0.4763247    MG   MC ind2 sample_1
#3 0.61538142 0.4526425    MG   MC ind2 sample_1
#
#[[3]]
#       var1      var2 state city  ind   sample
#1 0.1249893 0.0918184    MG   IT ind3 sample_1
#2 0.1323642 0.7891568    MG   IT ind3 sample_1
#3 0.7305105 0.2438753    MG   IT ind3 sample_1
#

df2.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_2", 3)))

df2.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_2", 3)))

df2.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_2", 3)))

list2 <- list(df2.1, df2.2, df2.3)

#> list2
#[[1]]
#        var1      var2 state city  ind   sample
#1 0.01054156 0.3740587    MG   BH ind1 sample_2
#2 0.24489289 0.6290580    MG   BH ind1 sample_2
#3 0.36355003 0.2140268    MG   BH ind1 sample_2
#
#[[2]]
#       var1      var2 state city  ind   sample
#1 0.2904603 0.1390745    MG   MC ind2 sample_2
#2 0.3843579 0.8289106    MG   MC ind2 sample_2
#3 0.4403131 0.6055418    MG   MC ind2 sample_2
#
#[[3]]
#       var1      var2 state city  ind   sample
#1 0.4711878 0.1148234    MG   IT ind3 sample_2
#2 0.4038921 0.3908316    MG   IT ind3 sample_2
#3 0.3886416 0.9038296    MG   IT ind3 sample_2

df3.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_3", 3)))

df3.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_3", 3)))

df3.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_3", 3)))

list3 <- list(df3.1, df3.2, df3.3)

#> list3
#[[1]]
#       var1      var2 state city  ind   sample
#1 0.2672011 0.5336193    MG   BH ind1 sample_3
#2 0.4413970 0.8593835    MG   BH ind1 sample_3
#3 0.3981449 0.6585343    MG   BH ind1 sample_3
#
#[[2]]
#       var1       var2 state city  ind   sample
#1 0.5090785 0.88560620    MG   MC ind2 sample_3
#2 0.1666667 0.08849541    MG   MC ind2 sample_3
#3 0.5226845 0.41225280    MG   MC ind2 sample_3
#
#[[3]]
#       var1      var2 state city  ind   sample
#1 0.7137117 0.3715057    MG   IT ind3 sample_3
#2 0.9605454 0.9443209    MG   IT ind3 sample_3
#3 0.1546365 0.6869942    MG   IT ind3 sample_3

My goal is to unify the three lists in a single one.我的目标是将三个列表统一在一个列表中。 The data of each individual ( ind ) will be summed up in a single data.frame.每个人 ( ind ) 的数据将汇总在一个 data.frame 中。

For numeric variables such as var1 and var2 I want the result to be the average value of each line among samples.对于var1var2等数值变量,我希望结果是样本中每的平均值。

For variables like state , city and ind I want the values to be kept (they are the same in every list)对于像statecityind这样的变量,我希望保留这些值(它们在每个列表中都是相同的)

The variable sample will have a different category in each list (sample_1, sample_2, sample_3).变量sample将在每个列表中具有不同的类别(样本_1、样本_2、样本_3)。 I would like to address a new value for this variable in the unified data.frame.我想在统一的 data.frame 中为这个变量指定一个新值。

The result I'm aiming for would look like the example below:我的目标是如下示例:

#> list_unified
#[[1]]
#       var1      var2 state city  ind  sample
#1 0.4590084 0.4549876    MG   BH ind1 unified
#2 0.1899593 0.4472606    MG   BH ind1 unified
#3 0.7441010 0.1136819    MG   BH ind1 unified
#
#[[2]]
#       var1      var2 state city  ind  sample
#1 0.5445125 0.1096332    MG   MC ind2 unified
#2 0.4039724 0.4898337    MG   MC ind2 unified
#3 0.9519204 0.1769643    MG   MC ind2 unified
#
#[[3]]
#       var1      var2 state city  ind  sample
#1 0.3971165 0.2631346    MG   IT ind3 unified
#2 0.3953296 0.8254704    MG   IT ind3 unified
#3 0.3472372 0.3235779    MG   IT ind3 unified

Any ideas?有任何想法吗?

My solution requires purrr and dplyr package.我的解决方案需要purrrdplyr package。

Not sure how inflexible your data is but easiest (albeit inelegant) would be to squash everything into a data.frame.不确定您的数据有多不灵活,但最简单(尽管不优雅)是将所有内容压缩到 data.frame 中。 But first, we need to preserve the line information, and my dumb way to do it would be:但首先,我们需要保留线路信息,而我愚蠢的做法是:

list1 <- list1 %>%
  map(~ mutate(.x, rownum = row_number()))
list2 <- list2 %>%
  map(~ mutate(.x, rownum = row_number()))
list3 <- list3 %>%
  map(~ mutate(.x, rownum = row_number()))

Then we simply squash the lists into a dataframe:然后我们简单地将列表压缩成 dataframe:

df <- dplyr::bind_rows(list1, list2, list3)

And then do dplyr magic:然后做dplyr魔术:

df1 <- df %>%
  group_by(rownum, ind, state, city) %>%
  summarise(var1 = mean(var1), var2 = mean(var2)) %>%
  mutate(sample = "unified")

And finally using split() function from base to make them into list again:最后从base使用split() function 使它们再次进入列表:

df1 %>% split(df1$ind)

And you get a list of three individuals...你会得到一个三个人的名单......

$ind1
# A tibble: 3 x 7
# Groups:   rownum, ind, state [3]
  rownum ind   state city   var1  var2 sample 
   <int> <chr> <chr> <chr> <dbl> <dbl> <chr>  
1      1 ind1  MG    BH    0.433 0.647 unified
2      2 ind1  MG    BH    0.617 0.253 unified
3      3 ind1  MG    BH    0.316 0.372 unified

$ind2
# A tibble: 3 x 7
# Groups:   rownum, ind, state [3]
  rownum ind   state city   var1  var2 sample 
   <int> <chr> <chr> <chr> <dbl> <dbl> <chr>  
1      1 ind2  MG    MC    0.854 0.500 unified
2      2 ind2  MG    MC    0.274 0.518 unified
3      3 ind2  MG    MC    0.515 0.309 unified

$ind3
# A tibble: 3 x 7
# Groups:   rownum, ind, state [3]
  rownum ind   state city   var1  var2 sample 
   <int> <chr> <chr> <chr> <dbl> <dbl> <chr>  
1      1 ind3  MG    IT    0.259 0.507 unified
2      2 ind3  MG    IT    0.147 0.487 unified
3      3 ind3  MG    IT    0.126 0.562 unified

You can remove rownum before splitting if you didn't like it.如果您不喜欢它,您可以在拆分之前删除rownum

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM