如何從R中的多個類似結構化excel文件中提取注釋？

Question

我有1000個具有相同結構的.xlsx文件。 全部包含標題行（id，填充日期，項目1至11）和帶有值的行。 項目11下的單元格在大多數文件中均包含注釋。 如何從所有文件中提取注釋並將它們組合到R中的單個對象中？

我設法通過創建文件列表files <- list.files(pattern = "*.xlsx", full.names = T)將所有文件組合到單個data.frame中，並使用sapply(files, read_excel)並將其與bind_rows()組合，但是， read_excel不會導入注釋。 我readxl使用了readxl和dplr軟件包。

我還設法使用xlsx_cells("file.xlsx")和x[x$address=="N8", c("address", "comment")]從單個文件中提取注釋，但我不知道如何用多個文件來做到這一點。 我將軟件包dplr和tidyxl用於此方法。

非常感謝您的幫助！

Answer 1

這是使用purrr的方法：

編輯：更改了解決方案，以輸出每個注釋的源文件並處理缺少此類注釋的文件，因為OP指定“最多”文件中存在該注釋。

library(tidyxl)
library(purrr)

# First, here's a list of xlsx files in the directory:
file_list <-  list.files() %>%
  .[str_detect(., ".xlsx")]
file_list
#[1] "test1.xlsx"            "test2.xlsx"            "test3 no comment.xlsx"


# Make a new tibble with two columns: 
#   file_name   is the source file we're looking at
#   comments    extracts the comments in N8, if any
tibble(file_name = file_list,
       comments = map(file_list,
                      ~ xlsx_cells(.) %>%
                        subset(address == "N8", comment))) %>%
  unnest(comments, keep_empty = TRUE)


## A tibble: 3 x 2
#  file_name             comment          
#  <chr>                 <chr>            
#1 test1.xlsx            Comment in file 1
#2 test2.xlsx            Comment in file 2
#3 test3 no comment.xlsx NA

如何從R中的多個類似結構化excel文件中提取注釋？

問題描述

1 個解決方案

解決方案1
0 已采納 2019-09-15 22:49:45

如何從R中的多個類似結構化excel文件中提取注釋？

問題描述

1 個解決方案

解決方案1 0 已采納 2019-09-15 22:49:45

解決方案1
0 已采納 2019-09-15 22:49:45