简体   繁体   中英

Apply a function to the rows of a data frame return a list of data frames

I am reading multiple Excel workbooks which have different ranges to be read, and potentially live in different sheets inside each workbook. I am using a master file which contains the file name, the name I want to call the data, the range to be read, and the sheet (if it isn't sheet 1). This is my master file:

Files = structure(list(file = c("Alaska.xls", "Analysis of Y-chromosome STRs in Chile.xlsx", 
"Bolivia.xlsx", "carribean.xlsx", "Chachapoya.xlsx", "Colombian.XLSX", 
"ndigenous Maya population from Guatemala.xlsx", "Nicaragua Nunez.xls", 
"Nicaragua.xls", "Palha Brazil.xls", "Patagonia.xls", "Promega Y23 Haplotypes Jun2019.xlsx", 
"Roewer et al.XLS", "Rio de Janeiro.xls", "The geographic mosaic of Ecuadorian.xlsx", 
"Xu2015Data-original.xlsx"), name = c("Alaska", "Chile", "Bolivia", 
"Carribean", "Chachapoya", "Colombian", "Guatemala", "Nicaragua", 
"Nicaragua", "Palha", "Patagonia", "Promega", "Roewer", "Rio", 
"Ecuador", "Xu"), range = c("G3:X31", "E3:U981", "I4:X230", "C4:S611", 
"C2:Y185", "I3:Q80", "D1:S101", "B1:R165", "AQ2:BF167", "G2:AB2534", 
"B8:J108", "C2:AT226", "J1:Y1012", "B3:Q608", "G4:AB419", "C2:S981"
), sheet = c("Table S8 Y chromosome STRs", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "")), class = "data.frame", row.names = c(NA, 
-16L))

And it looks like this:

> Files
                                            file       name     range                      sheet
1                                     Alaska.xls     Alaska    G3:X31 Table S8 Y chromosome STRs
2    Analysis of Y-chromosome STRs in Chile.xlsx      Chile   E3:U981                           
3                                   Bolivia.xlsx    Bolivia   I4:X230                           
4                                 carribean.xlsx  Carribean   C4:S611                           
5                                Chachapoya.xlsx Chachapoya   C2:Y185                           
6                                 Colombian.XLSX  Colombian    I3:Q80                           
7  ndigenous Maya population from Guatemala.xlsx  Guatemala   D1:S101                           
8                            Nicaragua Nunez.xls  Nicaragua   B1:R165                           
9                                  Nicaragua.xls  Nicaragua AQ2:BF167                           
10                              Palha Brazil.xls      Palha G2:AB2534                           
11                                 Patagonia.xls  Patagonia   B8:J108                           
12           Promega Y23 Haplotypes Jun2019.xlsx    Promega  C2:AT226                           
13                              Roewer et al.XLS     Roewer  J1:Y1012                           
14                            Rio de Janeiro.xls        Rio   B3:Q608                           
15      The geographic mosaic of Ecuadorian.xlsx    Ecuador  G4:AB419                           
16                      Xu2015Data-original.xlsx         Xu   C2:S981        

I would like to iterate over each row of this data frame, and use read_excel to read in the file, and store the returned data frame in a list with its name set to name .

I tried using just apply which is ugly and doesn't work:

readFiles = function(){
  Files = read.csv(system.file("extdata", "files.csv", package = "purps"))

  Sheets = vector(mode = "list", length = length(Files$File))
  names(Sheets) = Files$Name

  readFile = function(row){
    row = as.list(row)
    path = system.file("extdata", file, package = "purps")
    read_excel(path, range = row$range, sheet = ifelse(row$sheet == "", NULL, row$sheet))
  }

  Sheets = apply(Files, 1, readFile)

  return(Sheets)
}
> readFiles()
 Error in file.path(packagePath, ...) : 
  cannot coerce type 'closure' to vector of type 'character' 

I am sure there is an elegant solution using purrr or something else I don't know about, I am also sure I could just do this with a loop. but there has to be a more compact way.

You could try to split every row in Files as list of dataframes and then pass it to readFiles function.

readFiles = function(row){
   path = system.file("extdata", file, package = "purps")
   data <- readxl::read_excel(path, range = row$range, 
                   sheet = ifelse(row$sheet == "", NULL, row$sheet))
   return(data)
}

list_data <- lapply(split(Files, seq(nrow(Files))), readFiles)

To name the list you could do:

names(list_data) <- Files$name

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM