简体   繁体   English

R read_excel或readxl具有多个工作表的多个文件-绑定

[英]R read_excel or readxl Multiple Files with Multiple Sheets - Bind

I have a directory full of .xlsx files. 我有一个充满.xlsx文件的目录。 They all have multiple sheets. 他们都有多个工作表。 I want to extract the same sheet from all of the files and append them into a tibble. 我想从所有文件中提取同一张纸并将它们附加到小标题中。

I have found numerous solutions for extracting multiple sheets from a single Excel file; 我发现了许多解决方案,可以从一个Excel文件中提取多个工作表。 however, not a single sheet from multiple files. 但是,不是来自多个文件的一张纸。

I have tried: 我努力了:

    paths = as.tibble(list.files("data/BAH", pattern = ".xlsx", full.names = TRUE, all.files = FALSE))

    test <- paths %>% read_xlsx(sheet = "Portal", col_names = TRUE)

I know the "paths" variable contains all of my file names with path. 我知道“ paths”变量包含我所有带有path的文件名。 However, I am not sure how to iterate through each file name appending just the specific sheet = "Portal" to a csv file. 但是,我不确定如何遍历每个文件名,仅将特定的工作表=“ Portal”附加到csv文件。

The error is: 错误是:

Error: path must be a string

I have tried to pass in paths as a vector, as a tibble, and tried sub-scripting it as well. 我尝试将路径作为矢量,小标题传递,并尝试对它进行下标。 All fails. 全部失败。

So, in summary. 因此,总而言之。 I have a directory of xlsx files and I need to extract a single sheet from each one and append it to a csv file. 我有一个xlsx文件目录,我需要从每个文件中提取一张纸并将其附加到一个csv文件中。 I have tried using purrr with some map functions but also could not get it to work. 我尝试将purrr与某些地图功能一起使用,但也无法使其正常工作。

My goal was to use the Tidy way. 我的目标是使用整洁的方式。

Thanks for any hints. 感谢您的任何提示。

You have to use lapply() or map() . 您必须使用lapply()map() Try 尝试

test <- lapply(paths, read_xlsx, sheet = "Portal", col_names = TRUE)

or 要么

library(purrr)
test <- map_dfr(paths, read_xlsx, sheet = "Portal", col_names = TRUE)

You can then bind the dataframes with 然后,您可以将数据框与

library(dplyr)
test %>% bind_rows()
library(tidyverse)    
library(readxl)
library(fs)

# Get all files
xlsx_files <- fs::dir_ls("data/BAH", regexp = "\\.xlsx$")

paths = as_tibble(list.files("data/BAH", pattern = ".xlsx", full.names = TRUE, all.files = FALSE))


#portal_tabs <- map_dfr(paths, read_xlsx, sheet = "Portal", col_names = TRUE)
portal_tabs <- map_dfr(xlsx_files, read_xlsx, sheet = "Portal", col_names = TRUE, .id = 'source')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM