[英]combining data.frame and list of data.frames with no common variables in R
I have a data frame (D) and a list of data frames (L) that I want to combine into a new data frame. 我有一个数据框(D)和一个数据框列表(L),我想将它们组合成一个新的数据框。 There is one row in D for every data frame in L, and I want to join these data together so that each row in D is matched with the corresponding data frame in L and replicated across each row. 对于L中的每个数据帧,D中都有一行,我想将这些数据连接在一起,以便D中的每一行都与L中的相应数据帧匹配,并在每一行中复制。 The data frames in L have varying row numbers, but they all have the same columns and could easily be combined into a single data frame (eg, using plyr::rbind.fill
). L中的数据帧具有不同的行号,但是它们都具有相同的列,并且可以轻松地组合为单个数据帧(例如,使用plyr::rbind.fill
)。 There are no common variables between D and the data frames in L - the only way I know which rows go together is by the order in which they appear in D and L. D和L中的数据帧之间没有公共变量-我知道哪些行在一起的唯一方法是按它们在D和L中出现的顺序。
Here is toy data with the same structure as my data: 这是与我的数据具有相同结构的玩具数据:
# the data frame
D <- data.frame(name = c("john","sally","ben"), age = c(23, 31, 27))
# the list of data frames
john <- data.frame(attempt = 1:3, result = c("fail","fail","fail"))
sally <- data.frame(attempt = 1, result = c("success"))
ben <- data.frame(attempt = 1:5, result = c("fail","fail","success","fail","success"))
L <- list(john, sally, ben)
The dumb way I have tried to do this is with a for
loop: 我试图做到这一点的愚蠢方法是使用for
循环:
# loop to combine data frame and list
new_D <- data.frame()
for (i in 1:nrow(D)) {
add <- cbind(D[i,], L[[i]])
new_D <- rbind(new_D, add)
}
It works, but it is very slow and my files are quite large, so it is not practical. 它可以工作,但是非常慢,而且我的文件很大,因此不切实际。 What is a cleaner and more efficient way to do this in R? 在R中,有什么更干净,更有效的方法来做到这一点?
Name the list elements, convert the list to a single data.table
with an index column ("name"), join with the original data on the "name" column: 为列表元素命名,将列表转换为单个带索引列(“名称”)的data.table
,并与“名称”列上的原始数据连接:
names(L) <- D$name
D2 <- data.table::rbindlist(L, use.names = TRUE, idcol = "name")
D2[D, on = "name"]
# name attempt result age
# 1: john 1 fail 23
# 2: john 2 fail 23
# 3: john 3 fail 23
# 4: sally 1 success 31
# 5: ben 1 fail 27
# 6: ben 2 fail 27
# 7: ben 3 success 27
# 8: ben 4 fail 27
# 9: ben 5 success 27
We can do a split
by sequence of row and then with Map
cbind
the datasets 我们可以按行顺序进行split
,然后使用Map
cbind
数据集
do.call(rbind, Map(cbind, split(D, seq_len(nrow(D))), L))
Or set the names of 'L' with the paste
ed rows of 'D', bind the rows and separate
into two columns 或一组与“L”的名称paste
“d”的编排,结合行和separate
成两列
library(tidyverse)
do.call(paste, c(D, sep = ",")) %>%
set_names(L, .) %>%
bind_rows(.id = 'grp') %>%
separate(grp, into = c('name', 'age'))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.