简体   繁体   English

如何通过使用R将每个文件的数据添加为附加行来将不同的.csv文件组合为一个完整文件?

[英]How to combine different .csv files to one complete file by adding the data of every file as an additional row using R?

I have several different folders which all contain one single .csv file. 我有几个不同的文件夹,每个文件夹都包含一个.csv文件。 All of those .csv files have one single column containing the data of one condition of an experiment. 所有这些.csv文件都有一列,其中包含一项实验条件的数据。 I would like to merge those .csv files in such a way that the data of every file is added as a new column. 我想合并这些.csv文件,使每个文件的数据都添加为新列。

At the moment, it Looks somehow like this: 目前,它看起来像这样:

C1.csv
102
106
152
196
223
486
553

C2.csv
296
299
843
1033
1996

However, it would like to have one single .csv file, where all the separate files are copied into a new column containing the name of the source file, like: 但是,它希望有一个单个.csv文件,其中所有单独的文件都复制到包含源文件名称的新列中,例如:

C1     C2     ...    Cn
102    296    ...    ...
106    299    ...
152    843    ...
196    1033   ...
223    1996   ...
486           ...
553           ...

So far, I the following code: 到目前为止,我的代码如下:

myFiles = list.files(path = ".", recursive = TRUE, pattern = ".csv", full.names = TRUE)
data <- lapply(myFiles, read.table, sep="\t", header=FALSE)
Max <- max(sapply(data, length))
data <- lapply(data, function(x) c(x, rep(NA, Max - length(x))))
data <- do.call(cbind, data)
names(data) <- sub("^[^[:alnum:]]*([[:alnum:]]+)\\.csv$", "\\1", myFiles)

write.csv(data, "outfile.csv")

It yielded a document that looks like this instead of adding the data of every .csv file in a new column: 它产生了一个看起来像这样的文档,而不是将每个.csv文件的数据添加到新列中:

enter image description here 在此处输入图片说明

Is this what you want? 这是你想要的吗?
Note that I read the files in with scan . 请注意,我使用scan读取了文件。 Since the files have only one column there is no need for a complex function like read.csv . 由于文件只有一列,因此不需要像read.csv这样的复杂功能。

myFiles <- list.files(path = ".", pattern = "^C.*\\.csv", full.names = TRUE, recursive = TRUE)
data <- lapply(myFiles, scan)
Max <- max(sapply(data, length))
data <- lapply(data, function(x) c(x, rep(NA, Max - length(x))))
data <- do.call(cbind, data)
names(data) <- sub("^[^[:alnum:]]*([[:alnum:]]+)\\.csv$", "\\1", myFiles)

write.csv(data, "outfile.csv")

The contents of "outfile.csv" are "outfile.csv"的内容是

"","V1","V2"
"1",102,296
"2",106,299
"3",152,843
"4",196,1033
"5",223,1996
"6",486,NA
"7",553,NA

One can read all files using read.table in a list. 可以使用列表中的read.table读取所有文件。 Combine all data using dplyr::bind_rows . 使用dplyr::bind_rows合并所有数据。 Afterwards, use reshape2::dcast to spread data in wide format with a column for data from every file. 然后,使用reshape2::dcast扩展具有每个文件数据列的宽格式数据。

# Get list of files in directory
fileList <- list.files(".", "*.csv", full.names = TRUE)

# Read file data. This will generate a list containing dataframes
listData <- lapply(fileList, read.table)

# Name list using name of files
names(listData) <- gsub(".csv","",basename(fileList))

library(tidyverse)
library(reshape2)

bind_rows(listData, .id = "FileName") %>%
  group_by(FileName) %>%
  mutate(rowNum = row_number()) %>%
  dcast(rowNum~FileName, value.var = "V1") %>%
  select(-rowNum) %>%
  write.csv(file="Result.csv")

# Content of Result.csv
# "","C1","C2"
# "1",102,296
# "2",106,299
# "3",152,843
# "4",196,1033
# "5",223,1996
# "6",486,NA
# "7",553,NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM