简体   繁体   English

R:从多个文件夹中读取一个csv文件,并写入一个保留工作表名称的xslx文件

[英]R: read a csv file from multiple folders and write a xslx file keeping the sheet names

The directories' structure is: 目录的结构为:

data -> topic1 -> question1 -> sheetName.csv
               -> question2 -> sheetName.csv
               ...
     -> topic2 -> question1 -> sheetName.csv
               -> question2 -> sheetName.csv
     ...

The output I want an an excel file for each 'topic'. 我想要为每个“主题”提供一个excel文件的输出。 In each file, there are sheets that correpsond to the sheetName.csv within that topic. 在每个文件中,都有与该主题内的sheetName.csv对应的工作表。 Eg an excel file named: topic1.xlsx with 3 sheets, coresponding to 3 sheetName.csv files in topic 1. 例如,一个名为:topic1.xlsx的excel文件,包含3张纸,与主题1中的3张sheetName.csv文件对应。

BUT I also want to keep the sheet names as in the original .csv files. 但是,我也想保持工作表名称与原始.csv文件中的相同。 Note that the 'sheetName' is random (ie not follow any pattern). 请注意,“ sheetName”是随机的(即不遵循任何模式)。

Here are the codes I have tried so far: 这是到目前为止我尝试过的代码:

library(readxl)
library(writexl)
library(dplyr)

pathName <- "/data/"
topicName <- list.files(path = pathName)
for(i in 1:length(topicName)) {
  topicPath <- paste(pathName, topicName[[i]], sep = "")
  files_to_read = list.files(
    path = topicPath,
    pattern = '*.csv',
    recursive = TRUE,
    full.names = TRUE
  )
  data_lst <- list()  
  data_lst <- lapply(files_to_read, read.csv)
  setwd(pathName)  
  write_xlsx(data_lst, path = paste(topicName[[i]], ".", "xlsx", sep = ""))
}

The output I got is an excel file for each topic with the corresponding csv sheets, but the sheetnames are "sheet 1, sheet 2, etc...". 我得到的输出是每个主题都有相应csv工作表的excel文件,但是工作表名称为“工作表1,工作表2等...”。 Is there a way to keep the sheet names while writing to an excel file? 有没有办法在写入excel文件时保留工作表名称?

OK first I'll programmatically generate CSV files that mirrors the directory structure you described. 好的,首先,我将以编程方式生成CSV文件,以反映您描述的目录结构。 The CSVs will be named as random strings of digits. CSV将被命名为数字的随机字符串。

dir.create('data')
topics <- c("topic1", "topic2")
questions <- c("question1", "question2")

for(i in 1:length(topics)){
  dir.create(paste0('data/', topics[i]), showWarnings = F)
  for(j in 1:length(questions)){
    dir.create(paste0('data/', topics[i], "/", questions[j]), showWarnings = F)
    for(k in 1:3){
      set.seed(Sys.time())
      Sys.sleep(1)
      sheet <- as.character(round(runif(1, 1, 99999999)))
      print(sheet)
      file.name = paste0('data/', topics[i], "/", questions[j], "/", sheet, ".csv")
      write.csv(data.frame(x = 1), file = file.name)
    }
  }
}

Next, to answer your question, 接下来,要回答您的问题,

To write the CSV sheet names as XLSX workbook names, I created a for loop that gets the sheet name from the file name using two calls to strsplit() , and then calls xlsx::write.xlsx() to write the file. 为了将CSV工作表名称写为XLSX工作簿名称,我创建了一个for循环,使用两次对strsplit()调用从文件名中获取工作表名称,然后调用xlsx::write.xlsx()写入文件。 I used the xlsx package for writing xlsx because it allows specifying a sheet name and writing to the same xlsx with an append flag. 我使用xlsx包来编写xlsx,因为它允许指定工作表名称并使用附加标志写入同一xlsx。

library(xlsx)
library(dplyr)

pathName <- "data/"
topicName <- list.files(path = pathName)
for(i in 1:length(topicName)) {
  topicPath <- paste(pathName, topicName[[i]], sep = "")
  files_to_read = list.files(
    path = topicPath,
    pattern = '*.csv',
    recursive = TRUE,
    full.names = TRUE
  )
  data_lst <- list()
  for(k in 1:length(files_to_read)){
    sheet_name <- strsplit(strsplit(files_to_read[k], "/")[[1]][4], "\\.")[[1]][1]
    file_name <- paste0("data/", topicName[[i]], ".xlsx")
    dat <- read.csv(files_to_read[k])
    write.xlsx(dat, file=file_name, sheetName=sheet_name, row.names=F, append=T)
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM