简体   繁体   English

使用for循环来匹配R中的数据帧对

[英]Using for loops to match pairs of data frames in R

Using a particular function, I wish to merge pairs of data frames, for multiple pairings in an R directory. 我希望使用特定的功能合并数据帧对,以在R目录中进行多个配对。 I am trying to write a 'for loop' that will do this job for me, and while related questions such as Merge several data.frames into one data.frame with a loop are helpful, I am struggling to adapt example loops for this particular use. 我正在尝试编写一个“ for循环”来为我完成这项工作,而相关问题(例如将多个data.frames合并为一个data.frame和一个循环)是有帮助的,但我正在努力为此示例调整示例循环采用。

My data frames end with either “ _df1.csv” or ' _df2.csv”. 我的数据帧以“ _df1.csv”或“ _df2.csv”结尾。 Each pair, that I wish to merge into an output data frame, has an identical number at the being of the file name (ie 543_df1.csv and 543_df2.csv). 我希望合并为输出数据帧的每对文件名的名称相同(即543_df1.csv和543_df2.csv)。

I have created a character string for each of the two types of file in my directory using the list.files command as below: 我已经使用list.files命令为目录中的两种文件类型分别创建了一个字符串,如下所示:

df1files <- list.files(path="~/Desktop/combined files” pattern="*_df1.csv", full.names=T, recursive=FALSE)
df2files <- list.files(path="="~/Desktop/combined files ", pattern="*_df2.csv", full.names=T, recursive=FALSE)

The function and commands that I want to apply in order to merge each pair of data frames are as follows: 我要应用以合并每对数据帧的功能和命令如下:

findRow <- function(dt, df) { min(which(df$datetime > dt )) }
rows <- sapply(df2$datetime, findRow, df=df1)
merged <- cbind(df2, df1[rows,])

I am now trying to incorporate these commands into a for loop starting with something along the following lines, to prevent me from having to manually merge the pairs: 我现在正尝试将这些命令合并到for循环中,从以下几行开始,以防止我不得不手动合并这些对:

for(i in 1:length(df2files)){ ……

I am not yet a strong R programmer, and have hit a wall, so any help would be greatly appreciated. 我还不是一名R程序员,我还不是很坚强,并且碰壁了,所以任何帮助都将不胜感激。

My intuition (which I haven't had a chance to check) is that you should be able to do something like the following: 我的直觉(我没有机会检查)是您应该能够执行以下操作:

# read in the data as two lists of dataframes:
dfs1 <- lapply(df1files, read.csv)
dfs2 <- lapply(df2files, read.csv)

# define your merge commands as a function
merge2 <- function(df1, df2){
    findRow <- function(dt, df) { min(which(df$datetime > dt )) }
    rows <- sapply(df2$datetime, findRow, df=df1)
    merged <- cbind(df2, df1[rows,])
}

# apply that merge command to the list of lists
mergeddfs <- mapply(merge2, dfs1, dfs2, SIMPLIFY=FALSE)

# write results to files
outfilenames <- gsub("df1","merged",df1files)
mapply(function(x,y) write.csv(x,y), mergeddfs, outfilenames)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM