[英]How can I read multiple csv files into R at once and know which file the data is from?
I want to read multiple csv files into R and combine them into one large table.我想将多个 csv 文件读入 R 并将它们组合成一个大表。 I however need to a column that identifies which file each row came from.
但是,我需要一列来标识每行来自哪个文件。
Basically, every row has a unique identifying number within a file but those numbers are repeated across files.基本上,文件中的每一行都有一个唯一的标识号,但这些数字在文件中重复。 So if I bind all files into a table without knowing which file every row is from I won't have a unique identifier anymore which makes my planned analysis impossible.
因此,如果我在不知道每一行来自哪个文件的情况下将所有文件绑定到一个表中,我将不再有唯一标识符,这使得我计划的分析变得不可能。
What I have so far is this but this doesn't give me what file the data came from.到目前为止我所拥有的是这个,但这并没有告诉我数据来自哪个文件。
list_file <- list.files(pattern="*.csv") %>% lapply(read.csv,stringsAsFactors=F)
combo_data <- list.rbind(list_file)
I have about 100 files to read in so I'd really appreciate any help so I don't have to do them all individually.我有大约 100 个文件要读入,所以我非常感谢任何帮助,因此我不必单独完成所有文件。
One way would be to use map_df
from purrr
to bind all the csv's into one with a unique column identifier.一种方法是使用
map_df
中的purrr
将所有 csv 绑定到一个具有唯一列标识符的文件中。
filenames <- list.files(pattern="*.csv")
purrr::map_df(filenames, read.csv,stringsAsFactors = FALSE, .id = 'filename') %>%
dplyr::mutate(filename = filenames[filename]) -> combo_data
Also:还:
combo_data <- purrr::map_df(filenames,
~read.csv(.x, stringsAsFactors = FALSE) %>% mutate(filename = .x))
In base R:在基础 R 中:
combo_data <- do.call(rbind, lapply(filenames, function(x)
cbind(read.csv(x, stringsAsFactors = FALSE), filename = x)))
In case you want to use base R
you can use如果你想使用基础
R
你可以使用
file.names <- list.files(pattern = "*.csv")
df.list <- lapply(file.names, function(file.name)
{
df <- read.csv(file.name)
df$file.name <- file.name
return(df)
})
df <- list.rbind(df.list)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.