Please, consider the dummy data below:
df1.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_1", 3)))
df1.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_1", 3)))
df1.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_1", 3)))
list1 <- list(df1.1, df1.2, df1.3)
#> list1
#[[1]]
# var1 var2 state city ind sample
#1 0.91851330 0.37539222 MG BH ind1 sample_1
#2 0.07248773 0.28406666 MG BH ind1 sample_1
#3 0.66276294 0.09738144 MG BH ind1 sample_1
#
#[[2]]
# var1 var2 state city ind sample
#1 0.03620023 0.3837086 MG MC ind2 sample_1
#2 0.81407863 0.4763247 MG MC ind2 sample_1
#3 0.61538142 0.4526425 MG MC ind2 sample_1
#
#[[3]]
# var1 var2 state city ind sample
#1 0.1249893 0.0918184 MG IT ind3 sample_1
#2 0.1323642 0.7891568 MG IT ind3 sample_1
#3 0.7305105 0.2438753 MG IT ind3 sample_1
#
df2.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_2", 3)))
df2.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_2", 3)))
df2.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_2", 3)))
list2 <- list(df2.1, df2.2, df2.3)
#> list2
#[[1]]
# var1 var2 state city ind sample
#1 0.01054156 0.3740587 MG BH ind1 sample_2
#2 0.24489289 0.6290580 MG BH ind1 sample_2
#3 0.36355003 0.2140268 MG BH ind1 sample_2
#
#[[2]]
# var1 var2 state city ind sample
#1 0.2904603 0.1390745 MG MC ind2 sample_2
#2 0.3843579 0.8289106 MG MC ind2 sample_2
#3 0.4403131 0.6055418 MG MC ind2 sample_2
#
#[[3]]
# var1 var2 state city ind sample
#1 0.4711878 0.1148234 MG IT ind3 sample_2
#2 0.4038921 0.3908316 MG IT ind3 sample_2
#3 0.3886416 0.9038296 MG IT ind3 sample_2
df3.1 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("BH", 3)), ind = c( rep("ind1", 3)), sample = c( rep("sample_3", 3)))
df3.2 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("MC", 3)), ind = c( rep("ind2", 3)), sample = c( rep("sample_3", 3)))
df3.3 <-data.frame(var1 = runif(3), var2 = runif(3), state = c( rep("MG", 3)), city = c( rep("IT", 3)), ind = c( rep("ind3", 3)), sample = c( rep("sample_3", 3)))
list3 <- list(df3.1, df3.2, df3.3)
#> list3
#[[1]]
# var1 var2 state city ind sample
#1 0.2672011 0.5336193 MG BH ind1 sample_3
#2 0.4413970 0.8593835 MG BH ind1 sample_3
#3 0.3981449 0.6585343 MG BH ind1 sample_3
#
#[[2]]
# var1 var2 state city ind sample
#1 0.5090785 0.88560620 MG MC ind2 sample_3
#2 0.1666667 0.08849541 MG MC ind2 sample_3
#3 0.5226845 0.41225280 MG MC ind2 sample_3
#
#[[3]]
# var1 var2 state city ind sample
#1 0.7137117 0.3715057 MG IT ind3 sample_3
#2 0.9605454 0.9443209 MG IT ind3 sample_3
#3 0.1546365 0.6869942 MG IT ind3 sample_3
My goal is to unify the three lists in a single one. The data of each individual ( ind
) will be summed up in a single data.frame.
For numeric variables such as var1
and var2
I want the result to be the average value of each line among samples.
For variables like state
, city
and ind
I want the values to be kept (they are the same in every list)
The variable sample
will have a different category in each list (sample_1, sample_2, sample_3). I would like to address a new value for this variable in the unified data.frame.
The result I'm aiming for would look like the example below:
#> list_unified
#[[1]]
# var1 var2 state city ind sample
#1 0.4590084 0.4549876 MG BH ind1 unified
#2 0.1899593 0.4472606 MG BH ind1 unified
#3 0.7441010 0.1136819 MG BH ind1 unified
#
#[[2]]
# var1 var2 state city ind sample
#1 0.5445125 0.1096332 MG MC ind2 unified
#2 0.4039724 0.4898337 MG MC ind2 unified
#3 0.9519204 0.1769643 MG MC ind2 unified
#
#[[3]]
# var1 var2 state city ind sample
#1 0.3971165 0.2631346 MG IT ind3 unified
#2 0.3953296 0.8254704 MG IT ind3 unified
#3 0.3472372 0.3235779 MG IT ind3 unified
Any ideas?
My solution requires purrr
and dplyr
package.
Not sure how inflexible your data is but easiest (albeit inelegant) would be to squash everything into a data.frame. But first, we need to preserve the line information, and my dumb way to do it would be:
list1 <- list1 %>%
map(~ mutate(.x, rownum = row_number()))
list2 <- list2 %>%
map(~ mutate(.x, rownum = row_number()))
list3 <- list3 %>%
map(~ mutate(.x, rownum = row_number()))
Then we simply squash the lists into a dataframe:
df <- dplyr::bind_rows(list1, list2, list3)
And then do dplyr
magic:
df1 <- df %>%
group_by(rownum, ind, state, city) %>%
summarise(var1 = mean(var1), var2 = mean(var2)) %>%
mutate(sample = "unified")
And finally using split()
function from base
to make them into list again:
df1 %>% split(df1$ind)
And you get a list of three individuals...
$ind1
# A tibble: 3 x 7
# Groups: rownum, ind, state [3]
rownum ind state city var1 var2 sample
<int> <chr> <chr> <chr> <dbl> <dbl> <chr>
1 1 ind1 MG BH 0.433 0.647 unified
2 2 ind1 MG BH 0.617 0.253 unified
3 3 ind1 MG BH 0.316 0.372 unified
$ind2
# A tibble: 3 x 7
# Groups: rownum, ind, state [3]
rownum ind state city var1 var2 sample
<int> <chr> <chr> <chr> <dbl> <dbl> <chr>
1 1 ind2 MG MC 0.854 0.500 unified
2 2 ind2 MG MC 0.274 0.518 unified
3 3 ind2 MG MC 0.515 0.309 unified
$ind3
# A tibble: 3 x 7
# Groups: rownum, ind, state [3]
rownum ind state city var1 var2 sample
<int> <chr> <chr> <chr> <dbl> <dbl> <chr>
1 1 ind3 MG IT 0.259 0.507 unified
2 2 ind3 MG IT 0.147 0.487 unified
3 3 ind3 MG IT 0.126 0.562 unified
You can remove rownum
before splitting if you didn't like it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.