I have many .csv files which are similar in structure:
1.csv
Type n
A 1
B 20
C 34
D 5
...
2.csv
Type n
A 2
B 15
C 16
D 5
...
I want to combine them in something like:
Type n1 n2
A 1 2
B 20 15
C 34 16
D 5 5
...
When I use lapply I get
Type n Type n
A 1 A 2
B 20 B 15
C 34 C 16
D 5 D 5
...
Is there any simple way to combine them properly?
I'm open for solutions in either R or Python
Here are two options to consider if the structure is identical , but first some sample data:
cat("Type n", "A 1", "B 20", "C 34", "D 5", sep = "\n", file = "myfile1.txt")
cat("Type n", "A 2", "B 15", "C 16", "D 5", sep = "\n", file = "myfile2.txt")
Option 1: Drop the first column when you're reading the data in by using "NULL"
(with quotes) as the colClasses
for the column that needs to be dropped. Use cbind
to put the files together.
x <- read.table("myfile1.txt", header=TRUE)
y <- read.table("myfile2.txt", header=TRUE, colClasses=c("NULL", "numeric"))
cbind(x, y)
# Type n n
# 1 A 1 2
# 2 B 20 15
# 3 C 34 16
# 4 D 5 5
## For more files:
## do.call(cbind, list(x, y, ...))
Option 2: Read the files in normally, then subset with a c(FALSE, TRUE)
vector, put everything in a list
and cbind
together with the first column from any of the objects.
x1 <- read.table("myfile1.txt", header = TRUE)
y1 <- read.table("myfile2.txt", header = TRUE)
fileList <- list(x1, y1)
cbind(x1[1], do.call(cbind, fileList)[c(FALSE, TRUE)])
# Type n n.1
# 1 A 1 2
# 2 B 20 15
# 3 C 34 16
# 4 D 5 5
Of course, the above are just minimal examples. I'm presuming that you actually have more than 2 columns in each file. Use a vector of TRUE
s and FALSE
s that actually match your columns to keep and drop (respectively) for the second option, and "NULL"
and object classes for the first option.
If the data structure are similar but not identical , you might need to use merge
instead. Consider the following sample data. The first three files have the same structure, but the fourth one, "myfile4.txt"
has A, B, D, and E as the "Type" values, while the other three have "A", "B", "C", and "D"
cat("Type n", "A 1", "B 20", "C 34", "D 5", sep = "\n", file = "myfile1.txt")
cat("Type n", "A 2", "B 15", "C 16", "D 5", sep = "\n", file = "myfile2.txt")
cat("Type n", "A 1", "B 5", "C 6", "D 7", sep = "\n", file = "myfile3.txt")
cat("Type n", "A 8", "B 9", "D 11", "E 0", sep = "\n", file = "myfile4.txt")
Here's how we can tackle this.
Bulk read in the files:
x <- list.files(pattern="myfile") y <- lapply(x, read.table, header = TRUE)
Multiple merge
s will probably result in an error if it can't make unique names. Help merge
out by making unique names for the non-id columns to start.
library(data.table) ## for `setnames` ## setnames will silently assign new names ## to the original data in list "y" invisible(lapply(seq_along(y), function(z) setnames(y[[z]], "n", paste("n", z, sep = "_"))))
Use Reduce
to merge
the list items together using the "Type" column as the "id".
Reduce(function(x, y) merge(x, y, by = "Type", all = TRUE), y) # Type n_1 n_2 n_3 n_4 # 1 A 1 2 1 8 # 2 B 20 15 5 9 # 3 C 34 16 6 NA # 4 D 5 5 7 11 # 5 E NA NA NA 0
in Python you should use pandas
to perform these operations:
import pandas as pd
df1 = pd.read_csv('1.csv', sep='\s+', index_col=0)
df2 = pd.read_csv('2.csv', sep='\s+', index_col=0)
pd.concat([df1, df2], axis=1)
Out[16]:
n n
Type
A 1 2
B 20 15
C 34 16
D 5 5
If you expect more automated columns renaming:
pd.merge(df1, df2, left_index=True, right_index=True, suffixes=['1', '2'])
Out[20]:
n1 n2
Type
A 1 2
B 20 15
C 34 16
D 5 5
Another solution here assuming no merging needs to be done. If you have three files for example, you can read them in like this:
n <- 1:3
x <- lapply(sprintf('%s.csv', n), read.csv)
You just want to drop the first column in every table, so you can use sapply()
on [[.data.frame
to remove the unwanted column, and then combine it all into one data frame.
data.frame(Type = x[[1]]$Type, sapply(x, '[[', -1))
Or if you really want the names in the form n1
, n2
etc.:
data.frame(
Type = x[[1]]$Type,
setNames(lapply(x, '[[', -1), paste0('n', n))
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.