如何使用 R purrr 组合数百个 Excel 文件/工作表

Question

I've got hundreds of Excel files with varying quantities of sheets within said files.我有数百个 Excel 文件，其中包含不同数量的工作表。 I want to combine all these Excel files and sheets into one data frame.我想将所有这些 Excel 文件和工作表合并到一个数据框中。 Lucky for me, all the sheets are in the same format (they're a template filled out by customers and uploaded to a central repository).幸运的是，所有的工作表都采用相同的格式（它们是由客户填写并上传到中央存储库的模板）。

Let's simulate these Excel files and sheets with the code below:让我们用下面的代码模拟这些 Excel 文件和工作表：

library(tidyverse)
library(openxlsx)
library(readxl)
write.xlsx(list(iris, iris * 2, iris * 3), "three_sheets.xlsx")
write.xlsx(list(iris, iris / 2), "two_sheets.xlsx")

How would I use R purrr to combine these files and sheets into one data frame?我将如何使用 R purrr 将这些文件和工作表合并到一个数据框中？ And can I mutate a column to identify which file/sheet each row comes from?我可以改变一列来识别每行来自哪个文件/工作表吗？ If purrr isn't the best choice for this type of problem feel free to point out other solutions.如果 purrr 不是此类问题的最佳选择，请随时指出其他解决方案。

Answer 1

purrr seems to be a good choice for such operation. purrr似乎是此类操作的不错选择。 You can do :你可以做：

library(readxl)
library(purrr)

#Get full path of all excel files in the folder
all_files <- list.files('path/of/folder',pattern = '\\.xlsx$', full.names = TRUE)
For each file
result <- map_df(all_files, function(x) {
             #Get all the sheet names
             all_sheets <- excel_sheets(x)  
             #read the excel file with one sheet at a time
             map_df(all_sheets, ~read_excel(x, sheet = .x) %>% 
                       #add columns for filename and sheetname
                       dplyr::mutate(filename = basename(x), sheetname = .x))
})

如何使用 R purrr 组合数百个 Excel 文件/工作表

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-09-11 03:32:58

如何使用 R purrr 组合数百个 Excel 文件/工作表

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-09-11 03:32:58

解决方案1
2 已采纳 2020-09-11 03:32:58