[英]Join list of dataframes in sequence using full_join in R
I have a list of dataframes with similar variable names that I'm looking to join using full_join
in the order in which they appear in the list.我有一个具有相似变量名称的数据
full_join
列表,我希望按照它们在列表中出现的顺序使用full_join
进行连接。
require(tidyverse)
x<-data.frame(id=c("a","a","b","b","b","c","c","c","c"),
sub.id=c("1","2","1","2","3","1","2","3","4"))
y<-data.frame(id = as.character(rep(1:4,each=2)),
sub.id = c("AA","CC","DD","AA","GG","OO","PP","OW"))
z<-data.frame(id = c("AA","CC","DD","GG","OO","OW","PP"),
sub.id = as.character(1:7))
dfs<-list(x,y,z)
I've tried using reduce
from the purrr
package but this will join all dataframes in the list to the first dataframe.我已经尝试使用
reduce
从purrr
包,但这将加入所有dataframes列表中的第一个数据帧。 In this case the x
dataframe.在这种情况下,
x
数据帧。
dfs %>%
reduce(full_join,by = c("sub.id" = "id"))
Is there a way to perform a full_join
to the dataframes found in a list such that the by
follows the sequence that the dataframes appear on the list?有没有办法对列表中找到的数据帧执行
full_join
,使得by
遵循数据帧出现在列表中的顺序? In this example the sub.id
of x
would match with id
of y
and then the sub.id
from y
after joining would match the id
of z
for the final join.在此示例中,
x
的sub.id
将与y
id
匹配,然后加入后来自y
的sub.id
将与z
的id
匹配以进行最终sub.id
。
EDIT: The expected result of this should be similar to the following:编辑:这的预期结果应该类似于以下内容:
id sub.id.x sub.id.y sub.id.y.y
1 a 1 AA 1
2 a 1 CC 2
3 a 2 DD 3
4 a 2 AA 1
5 b 1 AA 1
6 b 1 CC 2
7 b 2 DD 3
8 b 2 AA 1
9 b 3 GG 4
10 b 3 OO 5
11 c 1 AA 1
12 c 1 CC 2
13 c 2 DD 3
14 c 2 AA 1
15 c 3 GG 4
16 c 3 OO 5
17 c 4 PP 7
18 c 4 OW 6
Joinded column name suffixes unchanged at this time.加入的列名后缀此时不变。
Perhaps, we need a for
loop to change the column names after each join on the output generated也许,我们需要一个
for
循环来在每次连接生成的输出后更改列名
out <- dfs[[1]]
for(i in 2:length(dfs)) {
out <- full_join(out, dfs[[i]], by = c('sub.id' = 'id'))
names(out)[names(out) == 'sub.id'] <- paste0("sub.id", i)
names(out)[names(out) == 'sub.id.y'] <- 'sub.id'
}
-output -输出
out
# id sub.id2 sub.id3 sub.id
#1 a 1 AA 1
#2 a 1 CC 2
#3 a 2 DD 3
#4 a 2 AA 1
#5 b 1 AA 1
#6 b 1 CC 2
#7 b 2 DD 3
#8 b 2 AA 1
#9 b 3 GG 4
#10 b 3 OO 5
#11 c 1 AA 1
#12 c 1 CC 2
#13 c 2 DD 3
#14 c 2 AA 1
#15 c 3 GG 4
#16 c 3 OO 5
#17 c 4 PP 7
#18 c 4 OW 6
If we can assume that the joining columns are always found on the end of the first dataframe and the first on the second dataframe, then you could do:如果我们可以假设连接列总是在第一个数据帧的末尾和第二个数据帧的第一个,那么你可以这样做:
In Base R:在基础 R 中:
Reduce(function(x,y) merge(x,y,by.x = tail(names(x),1), by.y = names(y)[1], all = TRUE), dfs)
sub.id1 sub.id0 id sub.id11
1 AA 1 a 1
2 AA 2 a 1
3 AA 1 c 1
4 AA 2 b 1
5 AA 1 b 1
6 AA 2 c 1
7 CC 1 a 2
8 CC 1 b 2
9 CC 1 c 2
10 DD 2 b 3
11 DD 2 a 3
12 DD 2 c 3
13 GG 3 b 4
14 GG 3 c 4
15 OO 3 b 5
16 OO 3 c 5
17 OW 4 c 6
18 PP 4 c 7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.