使用 aws.s3 包从 AWS S3 一次读取多个 CSV 文件对象

Question

I need to read multiple csv files from AWS S3 bucket with aws.s3 package in R and finally combine those files in single dataframe for further analysis.我需要使用 R 中的 aws.s3 包从 AWS S3 存储桶中读取多个 csv 文件，最后将这些文件合并到单个数据帧中以进行进一步分析。

Let' say I have several files in my S3 bucket like "variables_2019-08-12.csv" , "variables_2019-08-13.csv and "variables_2019-08-14.csv rtc.假设我的 S3 存储桶中有几个文件，例如 "variables_2019-08-12.csv" 、 "variables_2019-08-13.csv 和 "variables_2019-08-14.csv rtc.csv"

I am using aws.s3::s3read_using but for object part I can only read one csv file each time.我正在使用aws.s3::s3read_using但对于object部分，我每次只能读取一个 csv 文件。 File has a date in their name so I was wondering how to add a loop in here :文件的名称中有日期，所以我想知道如何在此处添加循环：

my_file <- 
s3read_using(FUN = read_csv, object = "variables_2019-08-12.csv", bucket = "my_bucket")

Answer 1

There are many ways of doing this in R but the most intuitive for me is using map_dfr from the {purrr} package:有R中这样做的方法很多，但最直观的对我来说是使用map_dfr从{} purrr包：

objects = c('variables_2019-08-12.csv', 'variables_2019-08-13.csv', …)
names(objects) = gsub('variables_(.*)\\.csv', '\\1', objects)
df = map_dfr(
    objects,
    ~ s3read_using(FUN = read_csv, object = .x, bucket = 'my_bucket'),
    .id = 'Date'
)

Because of the names(objects) assignment, and because we specify .id = 'Date' , the resulting data frame will have an additional column containing the date (based on the filenames) of each entry.由于names(objects)分配，并且因为我们指定.id = 'Date' ，结果数据框将有一个附加列，其中包含每个条目的日期（基于文件名）。

使用 aws.s3 包从 AWS S3 一次读取多个 CSV 文件对象

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-08-22 09:23:38

使用 aws.s3 包从 AWS S3 一次读取多个 CSV 文件对象

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-08-22 09:23:38

解决方案1
2 已采纳 2019-08-22 09:23:38