简体   繁体   中英

Merging data from multiple Excel workbooks with multiple Excel worksheets into an R dataframe

I am trying to combine a specific cell range within multiple Excel workbooks (which each have multiple sheets) into an R dataframe. Each of the Excel workbooks has an 'Instructions' sheet that I would like to ignore, then all subsequent sheets are structured in the same layout (and I would like to select cells H4:R17 from each). So far, I have managed to combine all cells across all worksheets within all workbooks into one very messy dataframe. However, I am struggling to identify the correct code to ignore the 'Instructions' sheet when reading this in and to subset by the required cell range (H4:R17). I've included my code below:

Any ideas?

#how to join multiple files based on multiple sheets
read_multiple_excel <- function(path) {
  path %>%
    excel_sheets() %>% 
    set_names() %>% 
    map_df(read_excel,path = path)
}
  
#joining all data.frames
data_df <- dir_ls(regexp = "xlsx") %>% 
  map_df(read_multiple_excel,
         .id = "workbook_name")

str(data_df)
library(tidyverse)
library(readxl)

files <- list.files(
  path = "./sourse/xlsx", 
  pattern = "\\.xlsx$", 
  full.names = T
)

list of files

[1] "./sourse/xlsx/df1.xlsx" "./sourse/xlsx/df2.xlsx"

list of sheet files

link <- data.frame(files = files) %>% 
  mutate(sheets = map(files, excel_sheets)) %>% 
  unnest(sheets) %>% 
  filter(sheets != "instruction")

# A tibble: 4 x 2
  files                  sheets
  <chr>                  <chr> 
1 ./sourse/xlsx/df1.xlsx sh1   
2 ./sourse/xlsx/df1.xlsx sh2   
3 ./sourse/xlsx/df2.xlsx sh1   
4 ./sourse/xlsx/df2.xlsx sh2  

result

map2_df(.x = set_names(link$files), .y = link$sheets, read_xlsx, .id = "workbook", range = "A1:B3")

# A tibble: 8 x 3
  workbook                   x     y
  <chr>                  <dbl> <dbl>
1 ./sourse/xlsx/df1.xlsx     1     3
2 ./sourse/xlsx/df1.xlsx     2     4
3 ./sourse/xlsx/df1.xlsx     5     7
4 ./sourse/xlsx/df1.xlsx     6     8
5 ./sourse/xlsx/df2.xlsx     9    11
6 ./sourse/xlsx/df2.xlsx    10    12
7 ./sourse/xlsx/df2.xlsx    13    15
8 ./sourse/xlsx/df2.xlsx    14    16

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM