[英]Reading multiple csv files in a folder for individual groups
我在一個文件夾中有 44496 個 csv 文件。
如果我想在一個文件夾中讀取所有這些 csv 文件,我可以這樣做:
files = list.files(pattern="*.csv")
library(data.table)
DT = do.call(rbind, lapply(files, fread)
每個文件的名稱為wrX_Y.csv
。 我有 5562 個 X 值和 8 個 Y 值。例如,對於 X 的每個值,我有 8 個 csv 文件。
wr1_258, wr1_260, wr1_265, wr1_280, wr1_290, wr1_300, wr1_310,wr1_320
wr2_258, wr2_260, wr2_265, wr2_280, wr2_290, wr2_300, wr2_310,wr2_320
.
.
.
.
wr5562_258, wr5562_260, wr5562_265, wr5562_280, wr5562_290, wr5562_300, wr5562_310,wr5562_320
我想合並屬於給定 X 的所有文件。例如,
wr1_258, wr1_260, wr1_265, wr1_280, wr1_290, wr1_300, wr1_310,wr1_320 into a single csv
wr2_258, wr2_260, wr2_265, wr2_280, wr2_290, wr2_300, wr2_310,wr2_320 into a single csv and so on
假設names.list
是一個包含 X 的所有值的向量。我如何讀取屬於單個 X 的所有 csv,將它們合並並寫出
for(i in names.list){
files <- list.files(pattern = "*.csv", full.names = T)
DT = do.call(rbind, lapply(files, fread) # one read those csv files which belong to i
fwrite(DT,paste0(i,"alldata.csv"))
}
這似乎比data.table
qn 更像是一個regex
。 改進您對list.files
功能的模式輸入如下:
for(i in names.list) {
files <- list.files(pattern=paste0("wr", i, "_(.*).csv"), full.names=TRUE)
DT <- rbindlist(lapply(files, fread))
fwrite(DT, paste0(i,"alldata.csv))
}
這應該有效。
## list all files in the folder, enter the path of folder containing .csv files
list_files <- list.files(path = 'path')
## number of iterations
len <- 5562
ffiles <- vector('list', length = len)
## create a dictionary
## this groups the name of files based on X values (1,2...5562)
for(i in seq(list_files))
{
file_name <- list_files[i]
string <- unlist(strsplit(file_name, split = '_'))[1]
string <- gsub(pattern = '[a-z]', replacement = '', x = string)
print(string)
if (string %in% names(ffiles))
{
ffiles[[string]] <- append(ffiles[[string]], file_name)
} else {
ffiles[[string]] <- file_name
}
}
## this will be used in next step
full_path <- list.files(path = 'folder_path', full.names = T)
## rbind and write all files
for (i in names(ffiles))
{
files_path <- append(files_path, sapply(ffiles[i], function(x) list.files(path = full_path, pattern = x, full.names = T)))
assign(paste0('df',i), do.call(rbind, lapply(files_path, fread)))
fwrite(get(paste0('df',i), .GlobalEnv), paste0('df_',i,'.csv'))
}
您可以使用rbindlist()
函數來 rbind 一個data.tables
list
和idcol
參數來添加一個額外的列,指示每行的原始文件:
library(data.table)
# Load all files into a named list of data.tables:
files <- list.files(pattern="*.csv")
dt_list <- lapply(files, fread)
names(dt_list) <- files # for the idcol argument in rbindlist
# Concatenate all data.tables into a single data.table, with a column
# indicating each row's file of origin.
# add use.names = TRUE if you columns are not in the same order in
# each file.
# add fill = TRUE if some columns are not present in all files
dt <- rbindlist(dt_list, idcol = "file")
# Convert file column to a column of X and Y
dt[, fileX := gsub("_.*", "", file)]
dt[, fileX := gsub("^wr", "", fileX)]
dt[, fileY := gsub(".*_", "", file)]
dt[, fileY := gsub(".csv", "", fileY)]
# For each X, output the corresponding data.table:
for (xtype in unique(dt$fileX)) {
# subset dt and drop file identifier columns
xdt <- dt[fileX == xtype]
xdt[, file := NULL]
xdt[, fileX := NULL]
xdt[, fileY := NULL]
# write table:
fwrite(xdt, file=paste0("wr", xtype, ".csv"))
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.