简体   繁体   English

R-如何合并多个不同长度的数据帧?

[英]R- How to merge multiple dataframes of different lengths?

I have been stuck with this issue for a while now. 我已经被这个问题困扰了一段时间了。 Need some help. 需要一些帮助。 I am reading the following files (which can be mire than 3 files files) into a dataframe. 我正在将以下文件(可能超过3个文件)读入数据帧。 My input files look like the following: file1: 我的输入文件如下所示:file1:

someName    someMOD someID
A   T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P),S762(P) 1
B   S495(P) 2
C   S162(P),Q159(D) 3
D   S45(P),C47(C),S48(P),S26(P) 4
E   S18(P)  5

file2: 文件2:

someName    someMOD someID
C   S162(P),Q159(D) 3
D   S45(P),C47(C),S48(P),S26(P) 4
F   S182(P) 6
E   S18(P)  5
Z   Q100(P) 9
A   T754(P),M691(O),S694(P),S739(P),S740(P) 1

file3: 文件3:

someName    someMOD someID
A   T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P) 1
B   S495(P) 2
D   S45(P),C47(C),S48(P),S26(P) 4
E   S18(P)  5
F   S182(P) 6
L   Z182(P) 8
C   S162(P),Q159(D) 3

My Code: 我的代码:

  fileList <- dir(pattern="*.xls")
  i<-1
  j<-1
  a<-list()
  mybigtable<-data.frame

  for (f in 1:length(fileList)){
    fileName <- fileList[f]
    X <-read.xls(fileName)
    if(regexpr("Drug_Rep", fileName)[1]>0){
      a[[i]]<-X
     }
      i=i+1
    }
    else{
        #Don't do anything
    }
  }

  #Now i want to merge my dataframes
  mymerge <- function(x, y)
    merge(x, y, by=c("someName", "someID"), all=TRUE))

  Reduce(mymerge,a) #passing my list of dataframes 'a'

I did dput() on my 'a' list: 我在'a'列表上做了dput():

list(structure(list(someName = structure(c(1L, 2L, 4L, 5L, 6L, 
7L, 3L), .Label = c("A", "B", "C", "D", "E", "F", "L"), class = "factor"), 
    someMOD = structure(c(6L, 5L, 4L, 2L, 3L, 7L, 1L), .Label = c("S162(P),Q159(D)", 
    "S18(P)", "S182(P)", "S45(P),C47(C),S48(P),S26(P)", "S495(P)", 
    "T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P)", 
    "Z182(P)"), class = "factor"), someID = c(1L, 2L, 4L, 5L, 
    6L, 8L, 3L)), .Names = c("someName", "someMOD", "someID"), class = "data.frame", row.names = c(NA, 
-7L)), structure(list(someName = structure(1:5, .Label = c("A", 
"B", "C", "D", "E"), class = "factor"), someMOD = structure(c(5L, 
4L, 1L, 3L, 2L), .Label = c("S162(P),Q159(D)", "S18(P)", "S45(P),C47(C),S48(P),S26(P)", 
"S495(P)", "T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P),S762(P)"
), class = "factor"), someID = 1:5), .Names = c("someName", "someMOD", 
"someID"), class = "data.frame", row.names = c(NA, -5L)), structure(list(
    someName = structure(c(2L, 3L, 5L, 4L, 6L, 1L), .Label = c("A", 
    "C", "D", "E", "F", "Z"), class = "factor"), someMOD = structure(c(2L, 
    5L, 4L, 3L, 1L, 6L), .Label = c("Q100(P)", "S162(P),Q159(D)", 
    "S18(P)", "S182(P)", "S45(P),C47(C),S48(P),S26(P)", "T754(P),M691(O),S694(P),S739(P),S740(P)"
    ), class = "factor"), someID = c(3L, 4L, 6L, 5L, 9L, 1L)), .Names = c("someName", 
"someMOD", "someID"), class = "data.frame", row.names = c(NA, 
-6L)))

What is my mistake in populating a list? 我在填充列表时犯了什么错误? Any help is really appreciated. 任何帮助都非常感谢。 I am just trying to get an out put like the following: 我只是想像下面这样放出产品:

在此处输入图片说明

The problem with the code I gave you before is that merge gets confused if there are any duplicate column names, and you're merging more than 3 datasets. 我之前给您的代码的问题是,如果有重复的列名, merge变得混乱, 并且您要合并3个以上的数据集。 You'll have to rename your someMOD columns so they don't clash. 您必须重命名someMOD列,这样它们才不会发生冲突。 A for loop works as well as anything for this purpose. for循环以及用于此目的的任何方法都有效。

dupvars <- which(!names(a[[1]]) %in% c("someName", "someID"))
for(i in seq_along(a))
    names(a[[i]])[dupvars] <- paste0(names(a[[i]])[dupvars], i)

# and then merge
Reduce(mymerge, a)

Perhaps the problem is that you're actually not trying to merge in the standard sense, but reshape . 也许问题在于您实际上不是在尝试按标准含义进行merge ,而是进行reshape In this case, you can rbind all the data.frame s together after adding a "time" variable, and use dcast from "reshape2" to get what you're after: 在这种情况下,您可以rbind所有data.frame加入了“时间”变量之后一起s,而使用dcast从“reshape2”让你以后:

Add a "time" variable and rbind the data.frame s together 添加一个“时间”变量和rbinddata.frame s ^在一起

temp <- do.call(rbind, 
                lapply(seq_along(a), 
                       function(x) data.frame(a[[x]], time = x)))
head(temp)
#   someName                                                 someMOD someID time
# 1        A T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P)      1    1
# 2        B                                                 S495(P)      2    1
# 3        D                             S45(P),C47(C),S48(P),S26(P)      4    1
# 4        E                                                  S18(P)      5    1
# 5        F                                                 S182(P)      6    1
# 6        L                                                 Z182(P)      8    1

Transform the data.frame from a "long" format to a "wide" format data.frame从“长”格式转换为“宽”格式

library(reshape2)
dcast(temp, someName + someID ~ time, value.var="someMOD")
#   someName someID                                                       1
# 1        A      1 T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P)
# 2        B      2                                                 S495(P)
# 3        C      3                                         S162(P),Q159(D)
# 4        D      4                             S45(P),C47(C),S48(P),S26(P)
# 5        E      5                                                  S18(P)
# 6        F      6                                                 S182(P)
# 7        L      8                                                 Z182(P)
# 8        Z      9                                                    <NA>
#                                                                 2
# 1 T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P),S762(P)
# 2                                                         S495(P)
# 3                                                 S162(P),Q159(D)
# 4                                     S45(P),C47(C),S48(P),S26(P)
# 5                                                          S18(P)
# 6                                                            <NA>
# 7                                                            <NA>
# 8                                                            <NA>
#                                         3
# 1 T754(P),M691(O),S694(P),S739(P),S740(P)
# 2                                    <NA>
# 3                         S162(P),Q159(D)
# 4             S45(P),C47(C),S48(P),S26(P)
# 5                                  S18(P)
# 6                                 S182(P)
# 7                                    <NA>
# 8                                 Q100(P)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM