[英]Merging corresponding data.frames in separate lists in R
Please, consider the dummy data below:请考虑以下虚拟数据:
df1.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_1", 3)))
df1.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_1", 3)))
df1.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_1", 3)))
list1 <- list(df1.1, df1.2, df1.3)
#> list1
#[[1]]
# var1 var2 state city ind sample
#1 0.91851330 0.37539222 MG BH ind1 sample_1
#2 0.07248773 0.28406666 MG BH ind1 sample_1
#3 0.66276294 0.09738144 MG BH ind1 sample_1
#
#[[2]]
# var1 var2 state city ind sample
#1 0.03620023 0.3837086 MG MC ind2 sample_1
#2 0.81407863 0.4763247 MG MC ind2 sample_1
#3 0.61538142 0.4526425 MG MC ind2 sample_1
#
#[[3]]
# var1 var2 state city ind sample
#1 0.1249893 0.0918184 MG IT ind3 sample_1
#2 0.1323642 0.7891568 MG IT ind3 sample_1
#3 0.7305105 0.2438753 MG IT ind3 sample_1
#
df2.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_2", 3)))
df2.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_2", 3)))
df2.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_2", 3)))
list2 <- list(df2.1, df2.2, df2.3)
#> list2
#[[1]]
# var1 var2 state city ind sample
#1 0.01054156 0.3740587 MG BH ind1 sample_2
#2 0.24489289 0.6290580 MG BH ind1 sample_2
#3 0.36355003 0.2140268 MG BH ind1 sample_2
#
#[[2]]
# var1 var2 state city ind sample
#1 0.2904603 0.1390745 MG MC ind2 sample_2
#2 0.3843579 0.8289106 MG MC ind2 sample_2
#3 0.4403131 0.6055418 MG MC ind2 sample_2
#
#[[3]]
# var1 var2 state city ind sample
#1 0.4711878 0.1148234 MG IT ind3 sample_2
#2 0.4038921 0.3908316 MG IT ind3 sample_2
#3 0.3886416 0.9038296 MG IT ind3 sample_2
df3.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_3", 3)))
df3.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_3", 3)))
df3.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_3", 3)))
list3 <- list(df3.1, df3.2, df3.3)
#> list3
#[[1]]
# var1 var2 state city ind sample
#1 0.2672011 0.5336193 MG BH ind1 sample_3
#2 0.4413970 0.8593835 MG BH ind1 sample_3
#3 0.3981449 0.6585343 MG BH ind1 sample_3
#
#[[2]]
# var1 var2 state city ind sample
#1 0.5090785 0.88560620 MG MC ind2 sample_3
#2 0.1666667 0.08849541 MG MC ind2 sample_3
#3 0.5226845 0.41225280 MG MC ind2 sample_3
#
#[[3]]
# var1 var2 state city ind sample
#1 0.7137117 0.3715057 MG IT ind3 sample_3
#2 0.9605454 0.9443209 MG IT ind3 sample_3
#3 0.1546365 0.6869942 MG IT ind3 sample_3
My goal is to unify the three lists in a single one.我的目标是将三个列表统一在一个列表中。 The data of each individual ( ind
) will be summed up in a single data.frame.每个人 ( ind
) 的数据将汇总在一个 data.frame 中。
For numeric variables such as var1
and var2
I want the result to be the average value of each line among samples.对于var1
和var2
等数值变量,我希望结果是样本中每行的平均值。
For variables like state
, city
and ind
I want the values to be kept (they are the same in every list)对于像state
、 city
和ind
这样的变量,我希望保留这些值(它们在每个列表中都是相同的)
The variable sample
will have a different category in each list (sample_1, sample_2, sample_3).变量sample
将在每个列表中具有不同的类别(样本_1、样本_2、样本_3)。 I would like to address a new value for this variable in the unified data.frame.我想在统一的 data.frame 中为这个变量指定一个新值。
The result I'm aiming for would look like the example below:我的目标是如下示例:
#> list_unified
#[[1]]
# var1 var2 state city ind sample
#1 0.4590084 0.4549876 MG BH ind1 unified
#2 0.1899593 0.4472606 MG BH ind1 unified
#3 0.7441010 0.1136819 MG BH ind1 unified
#
#[[2]]
# var1 var2 state city ind sample
#1 0.5445125 0.1096332 MG MC ind2 unified
#2 0.4039724 0.4898337 MG MC ind2 unified
#3 0.9519204 0.1769643 MG MC ind2 unified
#
#[[3]]
# var1 var2 state city ind sample
#1 0.3971165 0.2631346 MG IT ind3 unified
#2 0.3953296 0.8254704 MG IT ind3 unified
#3 0.3472372 0.3235779 MG IT ind3 unified
Any ideas?有任何想法吗?
My solution requires purrr
and dplyr
package.我的解决方案需要purrr
和dplyr
package。
Not sure how inflexible your data is but easiest (albeit inelegant) would be to squash everything into a data.frame.不确定您的数据有多不灵活,但最简单(尽管不优雅)是将所有内容压缩到 data.frame 中。 But first, we need to preserve the line information, and my dumb way to do it would be:但首先,我们需要保留线路信息,而我愚蠢的做法是:
list1 <- list1 %>%
map(~ mutate(.x, rownum = row_number()))
list2 <- list2 %>%
map(~ mutate(.x, rownum = row_number()))
list3 <- list3 %>%
map(~ mutate(.x, rownum = row_number()))
Then we simply squash the lists into a dataframe:然后我们简单地将列表压缩成 dataframe:
df <- dplyr::bind_rows(list1, list2, list3)
And then do dplyr
magic:然后做dplyr
魔术:
df1 <- df %>%
group_by(rownum, ind, state, city) %>%
summarise(var1 = mean(var1), var2 = mean(var2)) %>%
mutate(sample = "unified")
And finally using split()
function from base
to make them into list again:最后从base
使用split()
function 使它们再次进入列表:
df1 %>% split(df1$ind)
And you get a list of three individuals...你会得到一个三个人的名单......
$ind1
# A tibble: 3 x 7
# Groups: rownum, ind, state [3]
rownum ind state city var1 var2 sample
<int> <chr> <chr> <chr> <dbl> <dbl> <chr>
1 1 ind1 MG BH 0.433 0.647 unified
2 2 ind1 MG BH 0.617 0.253 unified
3 3 ind1 MG BH 0.316 0.372 unified
$ind2
# A tibble: 3 x 7
# Groups: rownum, ind, state [3]
rownum ind state city var1 var2 sample
<int> <chr> <chr> <chr> <dbl> <dbl> <chr>
1 1 ind2 MG MC 0.854 0.500 unified
2 2 ind2 MG MC 0.274 0.518 unified
3 3 ind2 MG MC 0.515 0.309 unified
$ind3
# A tibble: 3 x 7
# Groups: rownum, ind, state [3]
rownum ind state city var1 var2 sample
<int> <chr> <chr> <chr> <dbl> <dbl> <chr>
1 1 ind3 MG IT 0.259 0.507 unified
2 2 ind3 MG IT 0.147 0.487 unified
3 3 ind3 MG IT 0.126 0.562 unified
You can remove rownum
before splitting if you didn't like it.如果您不喜欢它,您可以在拆分之前删除rownum
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.