简体   繁体   中英

merging multiple dataframes in directory

I would like to merge multiple data frames in a directory. Some of these dataframes have duplicate rows. All dataframes have the same column information.

I found the code below on the following site however, I do not know how to modify it so that duplicate rows do not cause error.

I am getting the following response: Error in read.table(file = file, header = header, sep = sep, quote = quote, duplicate 'row.names' are not allowed

Here is the code to read in multiple data frames from a single directory. How can I modify it to circumvent the duplicate rows issue?

multmerge = function(mypath){
  filenames=list.files(path=mypath, full.names=TRUE)
  datalist = lapply(filenames, function(x){read.csv(file=x,header=T)})
  Reduce(function(x,y) {merge(x,y)}, datalist)}

mymergeddata <- multmerge("/Users/Danielle/Desktop/Working 
Directory/Ecuador/datasets to merge")

The Problem

The problem lies not in the merging, but rather in one or more of the individual csv where you have duplicate row names. Essentially, if you trying to do a simple read.csv() on a file that contains duplicate row names, you are going to get this exact error:

Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed

The Solution

So how do you circumvent it? You can either fix the individual csv, which may be more challenging than it sounds if you have say 20 csv in that directory. What I would suggest you do in that case is to not use row names during the reading process and if it's really necessary, then set the row names after the reading operation is done. For example:

multmerge = function(mypath){
  filenames=list.files(path=mypath, full.names=TRUE)
  datalist = lapply(filenames, function(x){read.csv(file=x,header=T, row.names = NULL)})
  Reduce(function(x,y) {rbind(x,y)}, datalist)}

mymergeddata <- multmerge("~/Desktop")
mymergeddata[mymergeddata$Day.Index == "2014-01-07",]


          Day.Index Sessions year
1    2014-01-07       57 2014
1091 2014-01-07       57 2014

See? Two completely identical values in Day.Index but because they are not row names you do not get an error. If you have changed the code to use the first column ( Day.Index ) as row names (by specifying row.names=1 ) then I am going to be able to replicate your error:

multmerge = function(mypath){
  filenames=list.files(path=mypath, full.names=TRUE)
  datalist = lapply(filenames, function(x){read.csv(file=x,header=T, row.names = 1)})
  Reduce(function(x,y) {rbind(x,y)}, datalist)}

mymergeddata <- multmerge("~/Desktop")
nrow(mymergeddata)

> Error in read.table(file = file, header = header, 
sep = sep, quote = quote, : duplicate 'row.names' are not allowed

Trivial I am using rbind() to append by row, but you could have swapped that with merge() in-place and the answer is still correct.

Essentially: R requires that the row names of its data frame is unique.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM