簡體   English   中英

從 excel 文件中讀取,其中包含空工作表作為列表並轉換為 R 中的數據框。 bind_rows 中的錯誤

[英]Reading from an excel file having empty sheets as list and converting to a data frame in R. Error in bind_rows

我有一個帶有多個工作表(> 70)的 excel 文件,我正在從這些工作表中讀取作為列表並使用以下 function 將它們轉換為數據框。

read_excel_allsheets <- function(filename) {
    
    sheets <- getSheetNames(filename)
    x <- lapply(sheets, function(X) read.xlsx(filename, sheet = X))
    names(x) <- sheets
    
    ###Convert to a dataframe with list name as column
    DF <- bind_rows(x, .id = "SampleName")
    DF
    }

但是當 excel 工作表有 0 行且只有列標題時,bind_rows 會產生問題。 錯誤:無法組合List1$NameList2$Name

我嘗試了這個解決方案rbinding a list of data frame R with NULL ,但在我的情況下不起作用。 我需要一個帶有工作表名稱的新列,用於分隔每個列表。

下面發布了一個示例數據:

      dput(x)
      list(HR = structure(list(Name = c("John", "Jason", "Eliza", "Linda"
          ), Age = c(27, 42, 30, 28), Title = c("HR Genaralist", "HR Manager", 
      "Project Manager", "Safety Manager")), row.names = c(NA, 4L), class = "data.frame"), 
      IT = structure(list(Name = c("Nivin", "Matt", "Jose", "Jacky"
      ), Age = c(35, 28, 40, 50), Title = c("Security Architect", 
      "Manager", "Engineer", "Project Manager")), row.names = c(NA, 
      4L), class = "data.frame"), Scientific = structure(list(Name = c("Betty", 
      "Dan", "Rob", "Bob"), Age = c(35, 40, 43, 45), Title = c("Data Analyst", 
      "Data Analyst", "Data Scientist", "Data Scientist")), row.names = c(NA, 
      4L), class = "data.frame"), Volunteer = structure(list(Name = logical(0), 
      Age = logical(0), Title = logical(0)), row.names = integer(0), class = "data.frame"))

謝謝你。

使用基礎 R 中的rbind()

do.call(rbind, x)

              Name Age              Title
HR.1          John  27      HR Genaralist
HR.2         Jason  42         HR Manager
HR.3         Eliza  30    Project Manager
HR.4         Linda  28     Safety Manager
IT.1         Nivin  35 Security Architect
IT.2          Matt  28            Manager
IT.3          Jose  40           Engineer
IT.4         Jacky  50    Project Manager
Scientific.1 Betty  35       Data Analyst
Scientific.2   Dan  40       Data Analyst
Scientific.3   Rob  43     Data Scientist
Scientific.4   Bob  45     Data Scientist

使用data.table

rbindlist(x, idcol = "sheet")

         sheet  Name Age              Title
 1:         HR  John  27      HR Genaralist
 2:         HR Jason  42         HR Manager
 3:         HR Eliza  30    Project Manager
 4:         HR Linda  28     Safety Manager
 5:         IT Nivin  35 Security Architect
 6:         IT  Matt  28            Manager
 7:         IT  Jose  40           Engineer
 8:         IT Jacky  50    Project Manager
 9: Scientific Betty  35       Data Analyst
10: Scientific   Dan  40       Data Analyst
11: Scientific   Rob  43     Data Scientist
12: Scientific   Bob  45     Data Scientist

要將空的 data.frames 保留為NA的一行,您可以預處理:

x <- lapply(x, function(x) if (nrow(x) == 0L) {x[1, ]} else x)

我們還可以使用bind_rows中的dplyrdiscard或僅keep超過一行的list元素之后

library(dplyr)
library(purrr)
keep(x, ~ nrow(.x) >0) %>%
     bind_rows(.id = 'sheet')
#       sheet  Name Age              Title
#1          HR  John  27      HR Genaralist
#2          HR Jason  42         HR Manager
#3          HR Eliza  30    Project Manager
#4          HR Linda  28     Safety Manager
#5          IT Nivin  35 Security Architect
#6          IT  Matt  28            Manager
#7          IT  Jose  40           Engineer
#8          IT Jacky  50    Project Manager
#9  Scientific Betty  35       Data Analyst
#10 Scientific   Dan  40       Data Analyst
#11 Scientific   Rob  43     Data Scientist
#12 Scientific   Bob  45     Data Scientist

如果我們想保留 NA 有 0 行的 data.frame,那么

map_dfr(x, ~ if(nrow(.x) == 0).x[1,] else .x, .id = 'sheet') %>% 
          as_tibble
# A tibble: 13 x 4
#   sheet      Name    Age Title             
#   <chr>      <chr> <dbl> <chr>             
# 1 HR         John     27 HR Genaralist     
# 2 HR         Jason    42 HR Manager        
# 3 HR         Eliza    30 Project Manager   
# 4 HR         Linda    28 Safety Manager    
# 5 IT         Nivin    35 Security Architect
# 6 IT         Matt     28 Manager           
# 7 IT         Jose     40 Engineer          
# 8 IT         Jacky    50 Project Manager   
# 9 Scientific Betty    35 Data Analyst      
#10 Scientific Dan      40 Data Analyst      
#11 Scientific Rob      43 Data Scientist    
#12 Scientific Bob      45 Data Scientist    
#13 Volunteer  <NA>     NA <NA>              
 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM