![](/img/trans.png)
[英]R how to change data types in list.file to prevent bind_rows error
[英]Reading from an excel file having empty sheets as list and converting to a data frame in R. Error in bind_rows
我有一個帶有多個工作表(> 70)的 excel 文件,我正在從這些工作表中讀取作為列表並使用以下 function 將它們轉換為數據框。
read_excel_allsheets <- function(filename) {
sheets <- getSheetNames(filename)
x <- lapply(sheets, function(X) read.xlsx(filename, sheet = X))
names(x) <- sheets
###Convert to a dataframe with list name as column
DF <- bind_rows(x, .id = "SampleName")
DF
}
但是當 excel 工作表有 0 行且只有列標題時,bind_rows 會產生問題。 錯誤:無法組合List1$Name
和List2$Name
。
我嘗試了這個解決方案rbinding a list of data frame R with NULL ,但在我的情況下不起作用。 我需要一個帶有工作表名稱的新列,用於分隔每個列表。
下面發布了一個示例數據:
dput(x)
list(HR = structure(list(Name = c("John", "Jason", "Eliza", "Linda"
), Age = c(27, 42, 30, 28), Title = c("HR Genaralist", "HR Manager",
"Project Manager", "Safety Manager")), row.names = c(NA, 4L), class = "data.frame"),
IT = structure(list(Name = c("Nivin", "Matt", "Jose", "Jacky"
), Age = c(35, 28, 40, 50), Title = c("Security Architect",
"Manager", "Engineer", "Project Manager")), row.names = c(NA,
4L), class = "data.frame"), Scientific = structure(list(Name = c("Betty",
"Dan", "Rob", "Bob"), Age = c(35, 40, 43, 45), Title = c("Data Analyst",
"Data Analyst", "Data Scientist", "Data Scientist")), row.names = c(NA,
4L), class = "data.frame"), Volunteer = structure(list(Name = logical(0),
Age = logical(0), Title = logical(0)), row.names = integer(0), class = "data.frame"))
謝謝你。
使用基礎 R 中的rbind()
:
do.call(rbind, x)
Name Age Title
HR.1 John 27 HR Genaralist
HR.2 Jason 42 HR Manager
HR.3 Eliza 30 Project Manager
HR.4 Linda 28 Safety Manager
IT.1 Nivin 35 Security Architect
IT.2 Matt 28 Manager
IT.3 Jose 40 Engineer
IT.4 Jacky 50 Project Manager
Scientific.1 Betty 35 Data Analyst
Scientific.2 Dan 40 Data Analyst
Scientific.3 Rob 43 Data Scientist
Scientific.4 Bob 45 Data Scientist
使用data.table
:
rbindlist(x, idcol = "sheet")
sheet Name Age Title
1: HR John 27 HR Genaralist
2: HR Jason 42 HR Manager
3: HR Eliza 30 Project Manager
4: HR Linda 28 Safety Manager
5: IT Nivin 35 Security Architect
6: IT Matt 28 Manager
7: IT Jose 40 Engineer
8: IT Jacky 50 Project Manager
9: Scientific Betty 35 Data Analyst
10: Scientific Dan 40 Data Analyst
11: Scientific Rob 43 Data Scientist
12: Scientific Bob 45 Data Scientist
要將空的 data.frames 保留為NA
的一行,您可以預處理:
x <- lapply(x, function(x) if (nrow(x) == 0L) {x[1, ]} else x)
我們還可以使用bind_rows
中的dplyr
在discard
或僅keep
超過一行的list
元素之后
library(dplyr)
library(purrr)
keep(x, ~ nrow(.x) >0) %>%
bind_rows(.id = 'sheet')
# sheet Name Age Title
#1 HR John 27 HR Genaralist
#2 HR Jason 42 HR Manager
#3 HR Eliza 30 Project Manager
#4 HR Linda 28 Safety Manager
#5 IT Nivin 35 Security Architect
#6 IT Matt 28 Manager
#7 IT Jose 40 Engineer
#8 IT Jacky 50 Project Manager
#9 Scientific Betty 35 Data Analyst
#10 Scientific Dan 40 Data Analyst
#11 Scientific Rob 43 Data Scientist
#12 Scientific Bob 45 Data Scientist
如果我們想保留 NA 有 0 行的 data.frame,那么
map_dfr(x, ~ if(nrow(.x) == 0).x[1,] else .x, .id = 'sheet') %>%
as_tibble
# A tibble: 13 x 4
# sheet Name Age Title
# <chr> <chr> <dbl> <chr>
# 1 HR John 27 HR Genaralist
# 2 HR Jason 42 HR Manager
# 3 HR Eliza 30 Project Manager
# 4 HR Linda 28 Safety Manager
# 5 IT Nivin 35 Security Architect
# 6 IT Matt 28 Manager
# 7 IT Jose 40 Engineer
# 8 IT Jacky 50 Project Manager
# 9 Scientific Betty 35 Data Analyst
#10 Scientific Dan 40 Data Analyst
#11 Scientific Rob 43 Data Scientist
#12 Scientific Bob 45 Data Scientist
#13 Volunteer <NA> NA <NA>
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.