簡體   English   中英

在 R 的單獨列表中合並相應的 data.frames

[英]Merging corresponding data.frames in separate lists in R

請考慮以下虛擬數據:

df1.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_1", 3)))

df1.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_1", 3)))

df1.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_1", 3)))

list1 <- list(df1.1, df1.2, df1.3)

#> list1
#[[1]]
#        var1       var2 state city  ind   sample
#1 0.91851330 0.37539222    MG   BH ind1 sample_1
#2 0.07248773 0.28406666    MG   BH ind1 sample_1
#3 0.66276294 0.09738144    MG   BH ind1 sample_1
#
#[[2]]
#        var1      var2 state city  ind   sample
#1 0.03620023 0.3837086    MG   MC ind2 sample_1
#2 0.81407863 0.4763247    MG   MC ind2 sample_1
#3 0.61538142 0.4526425    MG   MC ind2 sample_1
#
#[[3]]
#       var1      var2 state city  ind   sample
#1 0.1249893 0.0918184    MG   IT ind3 sample_1
#2 0.1323642 0.7891568    MG   IT ind3 sample_1
#3 0.7305105 0.2438753    MG   IT ind3 sample_1
#

df2.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_2", 3)))

df2.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_2", 3)))

df2.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_2", 3)))

list2 <- list(df2.1, df2.2, df2.3)

#> list2
#[[1]]
#        var1      var2 state city  ind   sample
#1 0.01054156 0.3740587    MG   BH ind1 sample_2
#2 0.24489289 0.6290580    MG   BH ind1 sample_2
#3 0.36355003 0.2140268    MG   BH ind1 sample_2
#
#[[2]]
#       var1      var2 state city  ind   sample
#1 0.2904603 0.1390745    MG   MC ind2 sample_2
#2 0.3843579 0.8289106    MG   MC ind2 sample_2
#3 0.4403131 0.6055418    MG   MC ind2 sample_2
#
#[[3]]
#       var1      var2 state city  ind   sample
#1 0.4711878 0.1148234    MG   IT ind3 sample_2
#2 0.4038921 0.3908316    MG   IT ind3 sample_2
#3 0.3886416 0.9038296    MG   IT ind3 sample_2

df3.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_3", 3)))

df3.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_3", 3)))

df3.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_3", 3)))

list3 <- list(df3.1, df3.2, df3.3)

#> list3
#[[1]]
#       var1      var2 state city  ind   sample
#1 0.2672011 0.5336193    MG   BH ind1 sample_3
#2 0.4413970 0.8593835    MG   BH ind1 sample_3
#3 0.3981449 0.6585343    MG   BH ind1 sample_3
#
#[[2]]
#       var1       var2 state city  ind   sample
#1 0.5090785 0.88560620    MG   MC ind2 sample_3
#2 0.1666667 0.08849541    MG   MC ind2 sample_3
#3 0.5226845 0.41225280    MG   MC ind2 sample_3
#
#[[3]]
#       var1      var2 state city  ind   sample
#1 0.7137117 0.3715057    MG   IT ind3 sample_3
#2 0.9605454 0.9443209    MG   IT ind3 sample_3
#3 0.1546365 0.6869942    MG   IT ind3 sample_3

我的目標是將三個列表統一在一個列表中。 每個人 ( ind ) 的數據將匯總在一個 data.frame 中。

對於var1var2等數值變量,我希望結果是樣本中每的平均值。

對於像statecityind這樣的變量,我希望保留這些值(它們在每個列表中都是相同的)

變量sample將在每個列表中具有不同的類別(樣本_1、樣本_2、樣本_3)。 我想在統一的 data.frame 中為這個變量指定一個新值。

我的目標是如下示例:

#> list_unified
#[[1]]
#       var1      var2 state city  ind  sample
#1 0.4590084 0.4549876    MG   BH ind1 unified
#2 0.1899593 0.4472606    MG   BH ind1 unified
#3 0.7441010 0.1136819    MG   BH ind1 unified
#
#[[2]]
#       var1      var2 state city  ind  sample
#1 0.5445125 0.1096332    MG   MC ind2 unified
#2 0.4039724 0.4898337    MG   MC ind2 unified
#3 0.9519204 0.1769643    MG   MC ind2 unified
#
#[[3]]
#       var1      var2 state city  ind  sample
#1 0.3971165 0.2631346    MG   IT ind3 unified
#2 0.3953296 0.8254704    MG   IT ind3 unified
#3 0.3472372 0.3235779    MG   IT ind3 unified

有任何想法嗎?

我的解決方案需要purrrdplyr package。

不確定您的數據有多不靈活,但最簡單(盡管不優雅)是將所有內容壓縮到 data.frame 中。 但首先,我們需要保留線路信息,而我愚蠢的做法是:

list1 <- list1 %>%
  map(~ mutate(.x, rownum = row_number()))
list2 <- list2 %>%
  map(~ mutate(.x, rownum = row_number()))
list3 <- list3 %>%
  map(~ mutate(.x, rownum = row_number()))

然后我們簡單地將列表壓縮成 dataframe:

df <- dplyr::bind_rows(list1, list2, list3)

然后做dplyr魔術:

df1 <- df %>%
  group_by(rownum, ind, state, city) %>%
  summarise(var1 = mean(var1), var2 = mean(var2)) %>%
  mutate(sample = "unified")

最后從base使用split() function 使它們再次進入列表:

df1 %>% split(df1$ind)

你會得到一個三個人的名單......

$ind1
# A tibble: 3 x 7
# Groups:   rownum, ind, state [3]
  rownum ind   state city   var1  var2 sample 
   <int> <chr> <chr> <chr> <dbl> <dbl> <chr>  
1      1 ind1  MG    BH    0.433 0.647 unified
2      2 ind1  MG    BH    0.617 0.253 unified
3      3 ind1  MG    BH    0.316 0.372 unified

$ind2
# A tibble: 3 x 7
# Groups:   rownum, ind, state [3]
  rownum ind   state city   var1  var2 sample 
   <int> <chr> <chr> <chr> <dbl> <dbl> <chr>  
1      1 ind2  MG    MC    0.854 0.500 unified
2      2 ind2  MG    MC    0.274 0.518 unified
3      3 ind2  MG    MC    0.515 0.309 unified

$ind3
# A tibble: 3 x 7
# Groups:   rownum, ind, state [3]
  rownum ind   state city   var1  var2 sample 
   <int> <chr> <chr> <chr> <dbl> <dbl> <chr>  
1      1 ind3  MG    IT    0.259 0.507 unified
2      2 ind3  MG    IT    0.147 0.487 unified
3      3 ind3  MG    IT    0.126 0.562 unified

如果您不喜歡它,您可以在拆分之前刪除rownum

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM