[英]Merging corresponding data.frames in separate lists in R
請考慮以下虛擬數據:
df1.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_1", 3)))
df1.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_1", 3)))
df1.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_1", 3)))
list1 <- list(df1.1, df1.2, df1.3)
#> list1
#[[1]]
# var1 var2 state city ind sample
#1 0.91851330 0.37539222 MG BH ind1 sample_1
#2 0.07248773 0.28406666 MG BH ind1 sample_1
#3 0.66276294 0.09738144 MG BH ind1 sample_1
#
#[[2]]
# var1 var2 state city ind sample
#1 0.03620023 0.3837086 MG MC ind2 sample_1
#2 0.81407863 0.4763247 MG MC ind2 sample_1
#3 0.61538142 0.4526425 MG MC ind2 sample_1
#
#[[3]]
# var1 var2 state city ind sample
#1 0.1249893 0.0918184 MG IT ind3 sample_1
#2 0.1323642 0.7891568 MG IT ind3 sample_1
#3 0.7305105 0.2438753 MG IT ind3 sample_1
#
df2.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_2", 3)))
df2.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_2", 3)))
df2.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_2", 3)))
list2 <- list(df2.1, df2.2, df2.3)
#> list2
#[[1]]
# var1 var2 state city ind sample
#1 0.01054156 0.3740587 MG BH ind1 sample_2
#2 0.24489289 0.6290580 MG BH ind1 sample_2
#3 0.36355003 0.2140268 MG BH ind1 sample_2
#
#[[2]]
# var1 var2 state city ind sample
#1 0.2904603 0.1390745 MG MC ind2 sample_2
#2 0.3843579 0.8289106 MG MC ind2 sample_2
#3 0.4403131 0.6055418 MG MC ind2 sample_2
#
#[[3]]
# var1 var2 state city ind sample
#1 0.4711878 0.1148234 MG IT ind3 sample_2
#2 0.4038921 0.3908316 MG IT ind3 sample_2
#3 0.3886416 0.9038296 MG IT ind3 sample_2
df3.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_3", 3)))
df3.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_3", 3)))
df3.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_3", 3)))
list3 <- list(df3.1, df3.2, df3.3)
#> list3
#[[1]]
# var1 var2 state city ind sample
#1 0.2672011 0.5336193 MG BH ind1 sample_3
#2 0.4413970 0.8593835 MG BH ind1 sample_3
#3 0.3981449 0.6585343 MG BH ind1 sample_3
#
#[[2]]
# var1 var2 state city ind sample
#1 0.5090785 0.88560620 MG MC ind2 sample_3
#2 0.1666667 0.08849541 MG MC ind2 sample_3
#3 0.5226845 0.41225280 MG MC ind2 sample_3
#
#[[3]]
# var1 var2 state city ind sample
#1 0.7137117 0.3715057 MG IT ind3 sample_3
#2 0.9605454 0.9443209 MG IT ind3 sample_3
#3 0.1546365 0.6869942 MG IT ind3 sample_3
我的目標是將三個列表統一在一個列表中。 每個人 ( ind
) 的數據將匯總在一個 data.frame 中。
對於var1
和var2
等數值變量,我希望結果是樣本中每行的平均值。
對於像state
、 city
和ind
這樣的變量,我希望保留這些值(它們在每個列表中都是相同的)
變量sample
將在每個列表中具有不同的類別(樣本_1、樣本_2、樣本_3)。 我想在統一的 data.frame 中為這個變量指定一個新值。
我的目標是如下示例:
#> list_unified
#[[1]]
# var1 var2 state city ind sample
#1 0.4590084 0.4549876 MG BH ind1 unified
#2 0.1899593 0.4472606 MG BH ind1 unified
#3 0.7441010 0.1136819 MG BH ind1 unified
#
#[[2]]
# var1 var2 state city ind sample
#1 0.5445125 0.1096332 MG MC ind2 unified
#2 0.4039724 0.4898337 MG MC ind2 unified
#3 0.9519204 0.1769643 MG MC ind2 unified
#
#[[3]]
# var1 var2 state city ind sample
#1 0.3971165 0.2631346 MG IT ind3 unified
#2 0.3953296 0.8254704 MG IT ind3 unified
#3 0.3472372 0.3235779 MG IT ind3 unified
有任何想法嗎?
我的解決方案需要purrr
和dplyr
package。
不確定您的數據有多不靈活,但最簡單(盡管不優雅)是將所有內容壓縮到 data.frame 中。 但首先,我們需要保留線路信息,而我愚蠢的做法是:
list1 <- list1 %>%
map(~ mutate(.x, rownum = row_number()))
list2 <- list2 %>%
map(~ mutate(.x, rownum = row_number()))
list3 <- list3 %>%
map(~ mutate(.x, rownum = row_number()))
然后我們簡單地將列表壓縮成 dataframe:
df <- dplyr::bind_rows(list1, list2, list3)
然后做dplyr
魔術:
df1 <- df %>%
group_by(rownum, ind, state, city) %>%
summarise(var1 = mean(var1), var2 = mean(var2)) %>%
mutate(sample = "unified")
最后從base
使用split()
function 使它們再次進入列表:
df1 %>% split(df1$ind)
你會得到一個三個人的名單......
$ind1
# A tibble: 3 x 7
# Groups: rownum, ind, state [3]
rownum ind state city var1 var2 sample
<int> <chr> <chr> <chr> <dbl> <dbl> <chr>
1 1 ind1 MG BH 0.433 0.647 unified
2 2 ind1 MG BH 0.617 0.253 unified
3 3 ind1 MG BH 0.316 0.372 unified
$ind2
# A tibble: 3 x 7
# Groups: rownum, ind, state [3]
rownum ind state city var1 var2 sample
<int> <chr> <chr> <chr> <dbl> <dbl> <chr>
1 1 ind2 MG MC 0.854 0.500 unified
2 2 ind2 MG MC 0.274 0.518 unified
3 3 ind2 MG MC 0.515 0.309 unified
$ind3
# A tibble: 3 x 7
# Groups: rownum, ind, state [3]
rownum ind state city var1 var2 sample
<int> <chr> <chr> <chr> <dbl> <dbl> <chr>
1 1 ind3 MG IT 0.259 0.507 unified
2 2 ind3 MG IT 0.147 0.487 unified
3 3 ind3 MG IT 0.126 0.562 unified
如果您不喜歡它,您可以在拆分之前刪除rownum
。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.