I've got hundreds of Excel files with varying quantities of sheets within said files. I want to combine all these Excel files and sheets into one data frame. Lucky for me, all the sheets are in the same format (they're a template filled out by customers and uploaded to a central repository).
Let's simulate these Excel files and sheets with the code below:
library(tidyverse)
library(openxlsx)
library(readxl)
write.xlsx(list(iris, iris * 2, iris * 3), "three_sheets.xlsx")
write.xlsx(list(iris, iris / 2), "two_sheets.xlsx")
How would I use R purrr to combine these files and sheets into one data frame? And can I mutate a column to identify which file/sheet each row comes from? If purrr isn't the best choice for this type of problem feel free to point out other solutions.
purrr
seems to be a good choice for such operation. You can do :
library(readxl)
library(purrr)
#Get full path of all excel files in the folder
all_files <- list.files('path/of/folder',pattern = '\\.xlsx$', full.names = TRUE)
For each file
result <- map_df(all_files, function(x) {
#Get all the sheet names
all_sheets <- excel_sheets(x)
#read the excel file with one sheet at a time
map_df(all_sheets, ~read_excel(x, sheet = .x) %>%
#add columns for filename and sheetname
dplyr::mutate(filename = basename(x), sheetname = .x))
})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.