在R中读入多个不同布局的Excel文件

Question

I have a collection of a dozen excel files I am reading in to a list of dataframes in R with the following code:我收集了十几个 excel 文件，我正在使用以下代码读取 R 中的数据帧列表：

data_path <- "path"
files <- dir(data_path, pattern = "*.xlsx")

data <- files %>%
  map(~readWorkbook(file.path(data_path, .), sheet = "Results"))

This grabs everything no problems.这抓住了一切没有问题。 The issue is that I need them all in the same format for further manipulation and, due to a non-universal layout, some are imported like this:问题是我需要它们都以相同的格式进行进一步操作，并且由于非通用布局，一些是这样导入的：

X1     2016     2017     2018
y       12       12       12

and others like this:和其他这样的：

Result
y         2016       2017       2018
x          12         12         12

The reason is because some excel files are forwarded to me with an additional row at the top with the string character 'Results'原因是因为一些 excel 文件被转发给我，顶部有一个额外的行，带有字符串字符“结果”

Now I could fix this with direct surgery to each one:现在我可以通过对每个人进行直接手术来解决这个问题：

names(data) <- rbind(data[1,])
names(data)[1] <- "X1"
data <- data[-c(1),]

But this seems like a rather ugly hack solution that will lead to automation problems down the line.但这似乎是一个相当丑陋的黑客解决方案，会导致自动化问题。 Is there a way to use the readWorkbook() function but specify to skip rows if they contain certain values?有没有办法使用 readWorkbook() function 但指定跳过包含某些值的行？

eg perhaps something like:例如，也许是这样的：

if value equal to 'Result' {
  skipRow()
}

Or to search dataframes for rows of dates and use these as column names?或者在数据框中搜索日期行并将其用作列名？

Answer 1

So, the easiest solution I can think of here is something like this.所以，我能想到的最简单的解决方案是这样的。

First, import the xlsx files with colNames = FALSE like so:首先，使用colNames = FALSE导入xlsx文件，如下所示：

data <- files %>%
  map(~readWorkbook(file.path(getwd(), .), sheet = "Sheet1", colNames = FALSE))

Now all you need to do is - remove the first row if it contains "Result" in the first column - assign each xlsx file to its own data frame (optional) - set the column names for each of these files (optional)现在您需要做的就是 - 如果第一行在第一列中包含“结果”，则删除第一行- 将每个xlsx文件分配给它自己的数据框（可选） - 为每个文件设置列名（可选）

This can be done like so:这可以这样做：

for(i in 1:length(data)){
  data[[i]] %<>% filter(X1 != "Result") #Alternatively data[[i]] <- data[[i]] %>% filter(X1 != "Result")
  assign(paste0("FileName", i), as.data.frame(data[[i]]))
  names(paste0("FileName", i)) <- c("Names", "For", "Your", "Columns")
}

Please note the usage of the reverse pipe %<>% (from the package magrittr ) in the first statement inside the for loop.请注意在 for 循环内的第一条语句中使用了反向 pipe %<>% （来自 package magrittr ）。

Note : this will remove any and all rows that contain the string "Result" in the first column.注意：这将删除第一列中包含字符串“Result”的所有行。

在R中读入多个不同布局的Excel文件

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-11-06 18:03:35

在R中读入多个不同布局的Excel文件

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-11-06 18:03:35

解决方案1
1 已采纳 2019-11-06 18:03:35