[英]r multiple joins from list of data frames of differing lengths and differing keys
Let's say I've got this list of data frames: 假设我有以下数据帧列表:
library(tidyverse)
df_list <- list(data.frame(cheese = c("ex","ok","bd"),
cheese_val = c(3:1),
stringsAsFactors = F),
data.frame(egg = c("great","good","bad", "eww"),
egg_val = c(4:1),
stringsAsFactors = F),
data.frame(milk = c("good","bad"),
milk_val = c(2:1),
stringsAsFactors = F))
And I've got this core data set: 而且我有以下核心数据集:
core_dat <- data.frame(cheese = c("ex","ok","ok", "bd", "ok"),
egg = c("great", "bad", "bad", "eww", "great"),
milk = c("good", "good", "good", "bad", "good"),
stringsAsFactors = F)
I'd like to get core_dat
joined individually with each element of df_list
. 我想让
core_dat
与df_list
每个元素分别df_list
。
I then tried this: 然后我尝试了这个:
for(i in 1:length(df_list)) {
gg<-core_dat %>%
left_join(df_list[[i]], by = names(df_list[[i]][1]), copy = T)
}
which ran but only applied the join to the milk
column such that the only additional column in core_dat
was milk_val
but I expected to see cheese_val
, and egg_val
too. 运行,但仅将
core_dat
应用于milk
列,因此core_dat
唯一的附加列是milk_val
但我希望也能看到cheese_val
和egg_val
。
I suspect there are more appropriate options than a for loop here and I am looking for suggestions. 我怀疑这里有比for循环更合适的选项,我正在寻找建议。 Note that my actual data set has many more df's than this small example.
请注意,我的实际数据集比这个小例子要多得多。
I should not that I expect the resulting data frame, in this case gg
, to contain 6 columns total (3 standard name + 3 with "val" suffix) such that it looks like printed version of this: 我不应该期望所得的数据帧(在这种情况下为
gg
总共包含6列(3个标准名称+ 3个带有“ val”后缀的列),使得它看起来像这样:
data.frame(cheese = c("ex","ok","ok", "bd", "ok"),
egg = c("great", "bad", "bad", "eww", "great"),
milk = c("good", "good", "good", "bad", "good"),
chees_val = c(3, 2, 2, 1, 2),
egg_val = c(4, 2, 2, 1, 4),
milk_val = c(2, 2, 2, 1, 2))
I've seen many "multiple joins" answers here but none that quite line up with what I'm trying to accomplish here (differing key columns, differing lengths of data). 我在这里看到了许多“多重联接”的答案,但没有一个与我在这里要完成的工作完全一致(不同的键列,不同的数据长度)。
You can use map
to get a list of joined data frames, then use reduce
to join them all together. 您可以使用
map
获取已连接数据框的列表,然后使用reduce
将它们全部连接在一起。
map(df_list, right_join, rownames_to_column(core_dat)) %>%
reduce(full_join)
# Joining, by = "cheese"
# Joining, by = "egg"
# Joining, by = "milk"
# Joining, by = c("cheese", "rowname", "egg", "milk")
# Joining, by = c("cheese", "rowname", "egg", "milk")
# cheese cheese_val rowname egg milk egg_val milk_val
# 1 ex 3 1 great good 4 2
# 2 ok 2 2 bad good 2 2
# 3 ok 2 3 bad good 2 2
# 4 bd 1 4 eww bad 1 1
# 5 ok 2 5 great good 4 2
This should give the desired output: 这应该提供所需的输出:
Reduce(merge,c(df_list,list(core_dat)))
cheese egg milk cheese_val egg_val milk_val
1 bd eww bad 1 1 1
2 ex great good 3 4 2
3 ok bad good 2 2 2
4 ok bad good 2 2 2
5 ok great good 2 4 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.