[英]How to combine different .csv files to one complete file by adding the data of every file as an additional row using R?
I have several different folders which all contain one single .csv file. 我有几个不同的文件夹,每个文件夹都包含一个.csv文件。 All of those .csv files have one single column containing the data of one condition of an experiment. 所有这些.csv文件都有一列,其中包含一项实验条件的数据。 I would like to merge those .csv files in such a way that the data of every file is added as a new column. 我想合并这些.csv文件,使每个文件的数据都添加为新列。
At the moment, it Looks somehow like this: 目前,它看起来像这样:
C1.csv
102
106
152
196
223
486
553
C2.csv
296
299
843
1033
1996
However, it would like to have one single .csv file, where all the separate files are copied into a new column containing the name of the source file, like: 但是,它希望有一个单个.csv文件,其中所有单独的文件都复制到包含源文件名称的新列中,例如:
C1 C2 ... Cn
102 296 ... ...
106 299 ...
152 843 ...
196 1033 ...
223 1996 ...
486 ...
553 ...
So far, I the following code: 到目前为止,我的代码如下:
myFiles = list.files(path = ".", recursive = TRUE, pattern = ".csv", full.names = TRUE)
data <- lapply(myFiles, read.table, sep="\t", header=FALSE)
Max <- max(sapply(data, length))
data <- lapply(data, function(x) c(x, rep(NA, Max - length(x))))
data <- do.call(cbind, data)
names(data) <- sub("^[^[:alnum:]]*([[:alnum:]]+)\\.csv$", "\\1", myFiles)
write.csv(data, "outfile.csv")
It yielded a document that looks like this instead of adding the data of every .csv file in a new column: 它产生了一个看起来像这样的文档,而不是将每个.csv文件的数据添加到新列中:
Is this what you want? 这是你想要的吗?
Note that I read the files in with scan
. 请注意,我使用scan
读取了文件。 Since the files have only one column there is no need for a complex function like read.csv
. 由于文件只有一列,因此不需要像read.csv
这样的复杂功能。
myFiles <- list.files(path = ".", pattern = "^C.*\\.csv", full.names = TRUE, recursive = TRUE)
data <- lapply(myFiles, scan)
Max <- max(sapply(data, length))
data <- lapply(data, function(x) c(x, rep(NA, Max - length(x))))
data <- do.call(cbind, data)
names(data) <- sub("^[^[:alnum:]]*([[:alnum:]]+)\\.csv$", "\\1", myFiles)
write.csv(data, "outfile.csv")
The contents of "outfile.csv"
are "outfile.csv"
的内容是
"","V1","V2"
"1",102,296
"2",106,299
"3",152,843
"4",196,1033
"5",223,1996
"6",486,NA
"7",553,NA
One can read all files using read.table
in a list. 可以使用列表中的read.table
读取所有文件。 Combine all data using dplyr::bind_rows
. 使用dplyr::bind_rows
合并所有数据。 Afterwards, use reshape2::dcast
to spread data in wide format with a column for data from every file. 然后,使用reshape2::dcast
扩展具有每个文件数据列的宽格式数据。
# Get list of files in directory
fileList <- list.files(".", "*.csv", full.names = TRUE)
# Read file data. This will generate a list containing dataframes
listData <- lapply(fileList, read.table)
# Name list using name of files
names(listData) <- gsub(".csv","",basename(fileList))
library(tidyverse)
library(reshape2)
bind_rows(listData, .id = "FileName") %>%
group_by(FileName) %>%
mutate(rowNum = row_number()) %>%
dcast(rowNum~FileName, value.var = "V1") %>%
select(-rowNum) %>%
write.csv(file="Result.csv")
# Content of Result.csv
# "","C1","C2"
# "1",102,296
# "2",106,299
# "3",152,843
# "4",196,1033
# "5",223,1996
# "6",486,NA
# "7",553,NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.