[英]How to read and combine Excel sheets from Amazon's S3?
目标
我想阅读位于亚马逊 S3 上每个 Excel 工作簿中的特定工作表。 这些 Excel 练习册有数百本。
试图
library(botor)
library(openxlsx)
library(tidyverse)
# Function to download an Excel Workbook and extract the third sheet at row 6.
read_simple <- function(FUN, s3_path, overwrite = TRUE) {
tmp <- botor::s3_download_file(s3_path, tempfile(fileext = ".xlsx"), force = overwrite)
FUN(tmp, startRow = 6, sheet = 3)
}
# Function to bind all files after some tidying
load_several_files <- function(template, list_of_files) {
#create template file with all the correct column headings
template_file <- read_simple(FUN = openxlsx::read.xlsx, s3_path = template)[0,] %>%
janitor::clean_names()
#take each file and then add the entries to the template - all the raw files have the same column headings
for (each_file in list_of_files) {
new_file <- read_simple(FUN = openxlsx::read.xlsx, s3_path = each_file) %>%
janitor::clean_names()
template_file <- template_file %>% bind_rows(new_file)
}
return(template_file)
}
#The following produces a list of the links to files in the bucket
list_files <- botor::s3_ls('s3://my_bucket/')
final_list <- list_files[2:nrow(list_files),3]
final_list
#I use the first file in the folder as the template and then try to add all the other files in the bucket.
load_several_files("s3://my_bucket/file1.xlsx", final_list)
问题
它没有给我包含所有数据的最终模板文件。 任何帮助,将不胜感激。
回答
对此感到抱歉 - 请随时删除我的问题,但我找到了答案。
解决方案
library(botor)
library(openxlsx)
library(tidyverse)
#Create list of files in bucket
list_files <- botor::s3_ls('s3://my_bucket/')
final_list <- list_files[2:nrow(list_files),3]
remove(list_files)
final_list
# Function to read each file
read_simple <- function(FUN, s3_path, overwrite = TRUE) {
tmp <- botor::s3_download_file(s3_path, tempfile(fileext = ".xlsx"), force = overwrite)
FUN(tmp, startRow = 6, sheet = 3)
}
#create template file with all the correct column headings
template_file <- read_simple(FUN = openxlsx::read.xlsx, s3_path = final_list[1])[0,] %>%
janitor::clean_names()
# Function to bind all files after some tidying
load_several_files <- function(template, list_of_files) {
#take each file and then add the entries to the template - all the files will have the same column headings
for (each_file in list_of_files) {
new_file <- read_simple(FUN = openxlsx::read.xlsx, s3_path = each_file) %>%
janitor::clean_names() %>%
template <- template %>% bind_rows(new_file)
}
template
}
final_file <- load_several_files(template_file, final_list)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.