简体   繁体   English

使用 aws.s3 包从 AWS S3 一次读取多个 CSV 文件对象

[英]Reading multiple CSV files object at once from AWS S3 using aws.s3 package

I need to read multiple csv files from AWS S3 bucket with aws.s3 package in R and finally combine those files in single dataframe for further analysis.我需要使用 R 中的 aws.s3 包从 AWS S3 存储桶中读取多个 csv 文件,最后将这些文件合并到单个数据帧中以进行进一步分析。

Let' say I have several files in my S3 bucket like "variables_2019-08-12.csv" , "variables_2019-08-13.csv and "variables_2019-08-14.csv rtc.假设我的 S3 存储桶中有几个文件,例如 "variables_2019-08-12.csv" 、 "variables_2019-08-13.csv 和 "variables_2019-08-14.csv rtc.csv"

I am using aws.s3::s3read_using but for object part I can only read one csv file each time.我正在使用aws.s3::s3read_using但对于object部分,我每次只能读取一个 csv 文件。 File has a date in their name so I was wondering how to add a loop in here :文件的名称中有日期,所以我想知道如何在此处添加循环:

my_file <- 
s3read_using(FUN = read_csv, object = "variables_2019-08-12.csv", bucket = "my_bucket")

There are many ways of doing this in R but the most intuitive for me is using map_dfr from the {purrr} package:有R中这样做的方法很多,但最直观的对我来说是使用map_dfr从{} purrr包:

objects = c('variables_2019-08-12.csv', 'variables_2019-08-13.csv', …)
names(objects) = gsub('variables_(.*)\\.csv', '\\1', objects)
df = map_dfr(
    objects,
    ~ s3read_using(FUN = read_csv, object = .x, bucket = 'my_bucket'),
    .id = 'Date'
)

Because of the names(objects) assignment, and because we specify .id = 'Date' , the resulting data frame will have an additional column containing the date (based on the filenames) of each entry.由于names(objects)分配,并且因为我们指定.id = 'Date' ,结果数据框将有一个附加列,其中包含每个条目的日期(基于文件名)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM