简体   繁体   English

R:将几个Excel文件读入不同的数据框并同时进行浏览

[英]R: Reading several Excel files into different dataframes and exploring these at the same time

I am writing some R code to convert 400 Excel files into machine readable flat files. 我正在编写一些R代码,将400个Excel文件转换为机器可读的平面文件。 On receiving these Excel files, there is a tight turnaround time and there is no possibility of receiving the initial files in a machine readable format. 接收这些Excel文件时,周转时间很短,并且不可能以计算机可读格式接收初始文件。

I have the R code which will pull out the data in the rows and columns that we need, delete the spaces and present it nicely in a MR format. 我有R代码,它将提取我们需要的行和列中的数据,删除空格并以MR格式很好地呈现它。 The issue I need to tackle now is that I need to confirm that each of the 400 files are in the correct format for the function to work properly. 我现在需要解决的问题是,我需要确认400个文件中的每个文件的格式正确,以使该功能正常工作。 To do this, I just want to check lots of simple things, eg that the column 'title' is in cell A9 in each of the Excel files. 为此,我只想检查许多简单的事情,例如,每个Excel文件中的单元格A9中的“标题”列。

I am new to R and really struggling to write a function that will let me examine all 400 files in one go. 我是R的新手,我真的很努力地编写一个函数,让我可以一次性检查所有400个文件。

The closest I have got is this: 我最接近的是:

template_dir <- "file path of main directory"
files <- list.files(path=template_dir, pattern="*.xlsx", full.names=TRUE, recursive=TRUE) 
df.files <- lapply (files, read_excel)

This then generates a list with 400 elements. 然后,将生成包含400个元素的列表。 I can load up each of these individually no problem with 我可以分别加载每个这些都没有问题

df.files [1]

But, if I try and use: 但是,如果我尝试使用:

title_loc <- which (df.files [1] == "Title", arr.ind = TRUE)

It does not work, I just get an empty value. 它不起作用,我只是得到一个空值。 I know the 'which' function works though, as when I just read a single Excel file in to R as a df (or put the file path in), then the 'which' function works fine and returns [1,9] as expected. 我知道“哪个”功能有效,因为当我将单个Excel文件作为df读取到R中(或将文件路径放入)时,“哪个”功能正常工作并返回[1,9]预期。

The 400 files are spread over several folders (nothing I can do about that either), and I can get a list of all the files using list.files. 这400个文件分布在几个文件夹中(我也无能为力),并且我可以使用list.files获得所有文件的列表。 What I want to do is execute a series of simple checks (reference for 'title'; reference for 'age'; reference for 'location' and so on) to confirm that all 400 files are laid out in the same way. 我要做的是执行一系列简单的检查(对“ title”的引用;对“ age”的引用;对“ location”的引用等等),以确认所有400个文件都以相同的方式进行布局。 So it would be ideal to list the output for 'title' in one df, so I can then check that the column is '1' for all 400 and the row is '9' for all 400. 因此,在一个df中列出“ title”的输出是理想的,因此我可以检查所有400的列是否为“ 1”,所有400的行是否为“ 9”。

I think what I want is this: 我想我想要的是:

title_loc <- which (*loop to cycle through every element in df.files* == "Title", arr.ind = TRUE)

But the way to write the loops is defeating me. 但是编写循环的方法使我败下阵来。 Would it be easier to get the filepath for all 400 Excel files in a list and then just cycle through those (rather than using lapply to import all the data)? 获取列表中所有400个Excel文件的文件路径然后循环浏览(而不是使用lapply导入所有数据)会更容易吗?

Thanks 谢谢

I'm not sure what machine readable format is, but if you want to loop through all Excel files in a folder and load all into Excel, the following code samples will do that for you. 我不确定机器可读的格式是什么,但是如果您要遍历文件夹中的所有Excel文件并将其全部加载到Excel中,则以下代码示例将为您做到这一点。

# load names of excel files 
files = list.files(path = "C:\\your_path_here\\", full.names = TRUE, pattern = ".xlsx")

# create function to read multiple sheets per excel file
read_excel_allsheets <- function(filename, tibble = FALSE) {
  sheets <- readxl::excel_sheets(filename)
  sapply(sheets, function(f) as.data.frame(readxl::read_excel(filename, sheet = f)), 
         simplify = FALSE)
}

# execute function for all excel files in "files"
all_data <- lapply(files, read_excel_allsheets)

Or 要么

library(XLConnect)

testDir <- "C:\\your_path_here\\"

re_file <- ".+\\.xls.?"
testFiles <- list.files(testDir, re_file, full.names = TRUE)

# This function rbinds in a single dataframe
# the content of multiple sheets in the same workbook
# (assuming that all the sheets have the same column types)
rbindAllSheets <- function(file) {
  wb <- loadWorkbook(file)
  sheets <- getSheets(wb)
  do.call(rbind,
          lapply(sheets, function(sheet) {
            readWorksheet(wb, sheet)
          })
  )
}

# Getting a single dataframe for all the Excel files
result <- do.call(rbind, lapply(testFiles, rbindAllSheets))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM