![](/img/trans.png)
[英]Efficient way to apply function to each row of data frame and return list of data frames
[英]Apply a function to the rows of a data frame return a list of data frames
我正在閱讀多個 Excel 工作簿,這些工作簿具有不同的閱讀范圍,並且可能存在於每個工作簿內的不同工作表中。 我正在使用一個主文件,其中包含文件名、我要調用數據的名稱、要讀取的范圍和工作表(如果不是工作表 1)。 這是我的主文件:
Files = structure(list(file = c("Alaska.xls", "Analysis of Y-chromosome STRs in Chile.xlsx",
"Bolivia.xlsx", "carribean.xlsx", "Chachapoya.xlsx", "Colombian.XLSX",
"ndigenous Maya population from Guatemala.xlsx", "Nicaragua Nunez.xls",
"Nicaragua.xls", "Palha Brazil.xls", "Patagonia.xls", "Promega Y23 Haplotypes Jun2019.xlsx",
"Roewer et al.XLS", "Rio de Janeiro.xls", "The geographic mosaic of Ecuadorian.xlsx",
"Xu2015Data-original.xlsx"), name = c("Alaska", "Chile", "Bolivia",
"Carribean", "Chachapoya", "Colombian", "Guatemala", "Nicaragua",
"Nicaragua", "Palha", "Patagonia", "Promega", "Roewer", "Rio",
"Ecuador", "Xu"), range = c("G3:X31", "E3:U981", "I4:X230", "C4:S611",
"C2:Y185", "I3:Q80", "D1:S101", "B1:R165", "AQ2:BF167", "G2:AB2534",
"B8:J108", "C2:AT226", "J1:Y1012", "B3:Q608", "G4:AB419", "C2:S981"
), sheet = c("Table S8 Y chromosome STRs", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "")), class = "data.frame", row.names = c(NA,
-16L))
它看起來像這樣:
> Files
file name range sheet
1 Alaska.xls Alaska G3:X31 Table S8 Y chromosome STRs
2 Analysis of Y-chromosome STRs in Chile.xlsx Chile E3:U981
3 Bolivia.xlsx Bolivia I4:X230
4 carribean.xlsx Carribean C4:S611
5 Chachapoya.xlsx Chachapoya C2:Y185
6 Colombian.XLSX Colombian I3:Q80
7 ndigenous Maya population from Guatemala.xlsx Guatemala D1:S101
8 Nicaragua Nunez.xls Nicaragua B1:R165
9 Nicaragua.xls Nicaragua AQ2:BF167
10 Palha Brazil.xls Palha G2:AB2534
11 Patagonia.xls Patagonia B8:J108
12 Promega Y23 Haplotypes Jun2019.xlsx Promega C2:AT226
13 Roewer et al.XLS Roewer J1:Y1012
14 Rio de Janeiro.xls Rio B3:Q608
15 The geographic mosaic of Ecuadorian.xlsx Ecuador G4:AB419
16 Xu2015Data-original.xlsx Xu C2:S981
我想遍歷此數據框的每一行,並使用read_excel
讀取文件,並將返回的數據框存儲在名稱設置為name
的列表中。
我嘗試使用丑陋且不起作用的僅apply
:
readFiles = function(){
Files = read.csv(system.file("extdata", "files.csv", package = "purps"))
Sheets = vector(mode = "list", length = length(Files$File))
names(Sheets) = Files$Name
readFile = function(row){
row = as.list(row)
path = system.file("extdata", file, package = "purps")
read_excel(path, range = row$range, sheet = ifelse(row$sheet == "", NULL, row$sheet))
}
Sheets = apply(Files, 1, readFile)
return(Sheets)
}
> readFiles()
Error in file.path(packagePath, ...) :
cannot coerce type 'closure' to vector of type 'character'
我確信有一個優雅的解決方案使用purrr
或其他我不知道的東西,我也確信我可以通過循環來做到這一點。 但必須有一種更緊湊的方式。
您可以嘗試將Files
中的每一行拆分為數據幀列表,然后將其傳遞給readFiles
function。
readFiles = function(row){
path = system.file("extdata", file, package = "purps")
data <- readxl::read_excel(path, range = row$range,
sheet = ifelse(row$sheet == "", NULL, row$sheet))
return(data)
}
list_data <- lapply(split(Files, seq(nrow(Files))), readFiles)
要命名列表,您可以執行以下操作:
names(list_data) <- Files$name
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.