[英]Merging corresponding data.frames in separate lists in R
请考虑以下虚拟数据:
df1.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_1", 3)))
df1.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_1", 3)))
df1.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_1", 3)))
list1 <- list(df1.1, df1.2, df1.3)
#> list1
#[[1]]
# var1 var2 state city ind sample
#1 0.91851330 0.37539222 MG BH ind1 sample_1
#2 0.07248773 0.28406666 MG BH ind1 sample_1
#3 0.66276294 0.09738144 MG BH ind1 sample_1
#
#[[2]]
# var1 var2 state city ind sample
#1 0.03620023 0.3837086 MG MC ind2 sample_1
#2 0.81407863 0.4763247 MG MC ind2 sample_1
#3 0.61538142 0.4526425 MG MC ind2 sample_1
#
#[[3]]
# var1 var2 state city ind sample
#1 0.1249893 0.0918184 MG IT ind3 sample_1
#2 0.1323642 0.7891568 MG IT ind3 sample_1
#3 0.7305105 0.2438753 MG IT ind3 sample_1
#
df2.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_2", 3)))
df2.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_2", 3)))
df2.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_2", 3)))
list2 <- list(df2.1, df2.2, df2.3)
#> list2
#[[1]]
# var1 var2 state city ind sample
#1 0.01054156 0.3740587 MG BH ind1 sample_2
#2 0.24489289 0.6290580 MG BH ind1 sample_2
#3 0.36355003 0.2140268 MG BH ind1 sample_2
#
#[[2]]
# var1 var2 state city ind sample
#1 0.2904603 0.1390745 MG MC ind2 sample_2
#2 0.3843579 0.8289106 MG MC ind2 sample_2
#3 0.4403131 0.6055418 MG MC ind2 sample_2
#
#[[3]]
# var1 var2 state city ind sample
#1 0.4711878 0.1148234 MG IT ind3 sample_2
#2 0.4038921 0.3908316 MG IT ind3 sample_2
#3 0.3886416 0.9038296 MG IT ind3 sample_2
df3.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_3", 3)))
df3.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_3", 3)))
df3.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_3", 3)))
list3 <- list(df3.1, df3.2, df3.3)
#> list3
#[[1]]
# var1 var2 state city ind sample
#1 0.2672011 0.5336193 MG BH ind1 sample_3
#2 0.4413970 0.8593835 MG BH ind1 sample_3
#3 0.3981449 0.6585343 MG BH ind1 sample_3
#
#[[2]]
# var1 var2 state city ind sample
#1 0.5090785 0.88560620 MG MC ind2 sample_3
#2 0.1666667 0.08849541 MG MC ind2 sample_3
#3 0.5226845 0.41225280 MG MC ind2 sample_3
#
#[[3]]
# var1 var2 state city ind sample
#1 0.7137117 0.3715057 MG IT ind3 sample_3
#2 0.9605454 0.9443209 MG IT ind3 sample_3
#3 0.1546365 0.6869942 MG IT ind3 sample_3
我的目标是将三个列表统一在一个列表中。 每个人 ( ind
) 的数据将汇总在一个 data.frame 中。
对于var1
和var2
等数值变量,我希望结果是样本中每行的平均值。
对于像state
、 city
和ind
这样的变量,我希望保留这些值(它们在每个列表中都是相同的)
变量sample
将在每个列表中具有不同的类别(样本_1、样本_2、样本_3)。 我想在统一的 data.frame 中为这个变量指定一个新值。
我的目标是如下示例:
#> list_unified
#[[1]]
# var1 var2 state city ind sample
#1 0.4590084 0.4549876 MG BH ind1 unified
#2 0.1899593 0.4472606 MG BH ind1 unified
#3 0.7441010 0.1136819 MG BH ind1 unified
#
#[[2]]
# var1 var2 state city ind sample
#1 0.5445125 0.1096332 MG MC ind2 unified
#2 0.4039724 0.4898337 MG MC ind2 unified
#3 0.9519204 0.1769643 MG MC ind2 unified
#
#[[3]]
# var1 var2 state city ind sample
#1 0.3971165 0.2631346 MG IT ind3 unified
#2 0.3953296 0.8254704 MG IT ind3 unified
#3 0.3472372 0.3235779 MG IT ind3 unified
有任何想法吗?
我的解决方案需要purrr
和dplyr
package。
不确定您的数据有多不灵活,但最简单(尽管不优雅)是将所有内容压缩到 data.frame 中。 但首先,我们需要保留线路信息,而我愚蠢的做法是:
list1 <- list1 %>%
map(~ mutate(.x, rownum = row_number()))
list2 <- list2 %>%
map(~ mutate(.x, rownum = row_number()))
list3 <- list3 %>%
map(~ mutate(.x, rownum = row_number()))
然后我们简单地将列表压缩成 dataframe:
df <- dplyr::bind_rows(list1, list2, list3)
然后做dplyr
魔术:
df1 <- df %>%
group_by(rownum, ind, state, city) %>%
summarise(var1 = mean(var1), var2 = mean(var2)) %>%
mutate(sample = "unified")
最后从base
使用split()
function 使它们再次进入列表:
df1 %>% split(df1$ind)
你会得到一个三个人的名单......
$ind1
# A tibble: 3 x 7
# Groups: rownum, ind, state [3]
rownum ind state city var1 var2 sample
<int> <chr> <chr> <chr> <dbl> <dbl> <chr>
1 1 ind1 MG BH 0.433 0.647 unified
2 2 ind1 MG BH 0.617 0.253 unified
3 3 ind1 MG BH 0.316 0.372 unified
$ind2
# A tibble: 3 x 7
# Groups: rownum, ind, state [3]
rownum ind state city var1 var2 sample
<int> <chr> <chr> <chr> <dbl> <dbl> <chr>
1 1 ind2 MG MC 0.854 0.500 unified
2 2 ind2 MG MC 0.274 0.518 unified
3 3 ind2 MG MC 0.515 0.309 unified
$ind3
# A tibble: 3 x 7
# Groups: rownum, ind, state [3]
rownum ind state city var1 var2 sample
<int> <chr> <chr> <chr> <dbl> <dbl> <chr>
1 1 ind3 MG IT 0.259 0.507 unified
2 2 ind3 MG IT 0.147 0.487 unified
3 3 ind3 MG IT 0.126 0.562 unified
如果您不喜欢它,您可以在拆分之前删除rownum
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.