简体   繁体   English

使用R读取多个cs3中的csv文件并将它们组合为单个文件

[英]reading multiple csv files in s3 and combined them as a single file when the name of files are different using R

In each day I have multiple csv files with different names and I want to combine all CSVs in each day asa single file and put it in a loop for the other days as well. 每天我都有多个具有不同名称的csv文件,并且我想将每一天中的所有CSV合并为一个文件,并在以后的几天中将其放入循环中。

   path= 's3://data/ y= 2017 /m= 05'

In m=05 I have multiple csv files (around 200) with different names and also in other days such as m=06 I have 120 csv files . 在m = 05中,我有多个具有不同名称的csv文件(大约200个),在其他日子(例如m = 06)中,我有120个csv文件。

dates<- seq(as.Date('2017-05-05'), as.Date('2017-06-10'), "days")
for (i in 1:length(dates)){
dateofgen<-dates
filepath <- paste(path, "y=", format(as.Date(dateofgen), '%Y'), "/m=", format(as.Date(dateofgen), '%m'),"/d=",format(as.Date(dateofgen),'%d'), "/part-00012-e731138c-232c-48b0-958f-55f2c72f3327-c000.csv", sep='')
data <- s3read_using(read.csv, object=filepath, stringsAsFactors = F, bucket=gsub("/.*", '', gsub("s3://", '', filepath)))
}

How can I read and combine all files of a day into a single file using rbind or any merge function. 如何使用rbind或任何合并功能将一天中的所有文件读取并合并为一个文件。

    library(readxl)
    library(dplyr)

This gets the names of all .xls files in your working directory. 这将获取工作目录中所有.xls文件的名称。 You can also use '*.csv' 您也可以使用“ * .csv”

    file.list <- list.files(path = 's3://data/ y= 2017 /m= 05', pattern='*.xls')

This creates a nested list of your files. 这将创建文件的嵌套列表。

    df.list <- lapply(file.list, read_excel)

This pulls everything out of the nested list and binds all rows together. 这会将所有内容从嵌套列表中拉出并将所有行绑定在一起。

    tibble_of_your_xls_files <- bind_rows(df.list)

For your code I would run: 对于您的代码,我将运行:

    file.list <- list.files(path = 's3://data/ y= 2017 /m= 05', pattern='*.csv')
    df.list <- lapply(file.list, read_excel)
    m052017.df <- bind_rows(df.list)

We will use get_bucket_df method to get access to the object in the bucket and then using ldply function go through all objects in different days in each month and read s3 object using s3read_using() . 我们将使用get_bucket_df方法访问存储桶中的对象,然后使用ldply函数遍历每个月中不同日期的所有对象,并使用s3read_using()读取s3对象。

days=as.character(c('17','18','19','20','21','22','23','24','25','26','27','28','29','30','31'))
​
for (i in 01:31){
  path <- paste0("s3://data/ y= 2017 /m= 05/d=",days[i],sep = "")
  temp_df <- get_bucket_df(bucket = "data", prefix = path)
  temp_df <- temp_df[which(grepl(".csv", temp_df$Key)),]
  new_data <- ldply(temp_df$Key, function(x){
    s3path <- paste('s3://pa-datastore/',x,sep = "")
    raw_data <- s3read_using(read.csv, na.strings = '', header = FALSE, object = s3path, stringsAsFactors = F, bucket=gsub("/.*", '', gsub("s3://", '', s3path)))
    raw_data
  })
  dateofgen <- as.Date(paste0("2017-06-", days[i], sep = ""))
  new_path <- "s3://data/"
  filepath <- paste(new_path, "y=", format(as.Date(dateofgen), '%Y'), "/m=", format(as.Date(dateofgen), '%m'), "/newfile", dateofgen, ".csv", sep='')
  s3write_using(new_data, FUN=write.csv, row.names = F, object = filepath, bucket = gsub("/.*", '', gsub("s3://", '', filepath)))
base::print(paste0("completed for ", dateofgen, sep =""))
}
​

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 R 将多个 csv 文件编译成单个 xlsx 文件,并在 csv 文件名之后命名每个选项卡? - How to compile multiple csv files into a single xlsx file using R and name each tab after csv file name? 读取.csv文件,将其转换并使用R将其另存为.xlsx - Reading .csv files, converting them and saving them as .xlsx using R 如何使用 R 将多个 csv 文件编译成单个 xlsx 文件? - How to compile multiple csv files into a single xlsx file using R? 使用 aws.s3 包从 AWS S3 一次读取多个 CSV 文件对象 - Reading multiple CSV files object at once from AWS S3 using aws.s3 package 读取多个csv文件并获取R中每个csv文件的文件名 - Reading multiple csv files and getting the filename of each csv file in R 读取和绑定在 R 中具有不同列的多个 CSV 文件 - Reading and binding multiple CSV files that have different columns in R 读取csv文件时使用colClasses时R中的警告消息 - Warning message in R when using colClasses when reading csv files 使用正则表达式从带有R的文件夹中读取多个csv文件 - Reading multiple csv files from a folder with R using regex 使用单个代码读取多个 .csv 文件 - Reading the multiple .csv files with single code 将多个文件读入R后,如何将生成的df设置为文件名? - After reading multiple files into R, how can I set the resulting df's to the file name?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM