繁体   English   中英

如何从亚马逊的 S3 中读取和合并 Excel 张表?

[英]How to read and combine Excel sheets from Amazon's S3?

目标

我想阅读位于亚马逊 S3 上每个 Excel 工作簿中的特定工作表。 这些 Excel 练习册有数百本。

试图

library(botor)
library(openxlsx)
library(tidyverse)

# Function to download an Excel Workbook and extract the third sheet at row 6. 
read_simple <- function(FUN, s3_path, overwrite = TRUE) {
  tmp <- botor::s3_download_file(s3_path, tempfile(fileext = ".xlsx"), force = overwrite)
  FUN(tmp, startRow = 6, sheet = 3)
}

# Function to bind all files after some tidying
load_several_files <- function(template, list_of_files) {
  
  #create template file with all the correct column headings
  template_file <- read_simple(FUN = openxlsx::read.xlsx, s3_path = template)[0,] %>% 
    janitor::clean_names() 
  
  #take each file and then add the entries to the template - all the raw files have the same column headings
  for (each_file in list_of_files) {
    new_file <- read_simple(FUN = openxlsx::read.xlsx, s3_path = each_file) %>% 
      janitor::clean_names() 
    template_file <- template_file %>% bind_rows(new_file)
  }
  return(template_file)
}

#The following produces a list of the links to files in the bucket
list_files <- botor::s3_ls('s3://my_bucket/')
final_list <- list_files[2:nrow(list_files),3]
final_list

#I use the first file in the folder as the template and then try to add all the other files in the bucket. 
load_several_files("s3://my_bucket/file1.xlsx", final_list)

问题

它没有给我包含所有数据的最终模板文件。 任何帮助,将不胜感激。

回答

对此感到抱歉 - 请随时删除我的问题,但我找到了答案。

解决方案

library(botor)
library(openxlsx)
library(tidyverse)

#Create list of files in bucket
list_files <- botor::s3_ls('s3://my_bucket/')
final_list <- list_files[2:nrow(list_files),3]
remove(list_files)
final_list

# Function to read each file
read_simple <- function(FUN, s3_path, overwrite = TRUE) {
  tmp <- botor::s3_download_file(s3_path, tempfile(fileext = ".xlsx"), force = overwrite)
  FUN(tmp, startRow = 6, sheet = 3)
}

#create template file with all the correct column headings
template_file <- read_simple(FUN = openxlsx::read.xlsx, s3_path = final_list[1])[0,] %>% 
  janitor::clean_names() 

# Function to bind all files after some tidying
load_several_files <- function(template, list_of_files) {
  
  #take each file and then add the entries to the template - all the files will have the same column headings
  for (each_file in list_of_files) {
    new_file <- read_simple(FUN = openxlsx::read.xlsx, s3_path = each_file) %>% 
      janitor::clean_names() %>% 

    template <- template %>% bind_rows(new_file)
  }
  template
}


final_file <- load_several_files(template_file, final_list)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM