简体   繁体   中英

How do I rewrite these for loops to lapply in R

Multiple plots with one for-loop

I'm working on a plotting function and wrote these for-loops. I keep reading for loops are bad for memory use in R and I should program with apply or one of its variants. But I don't understand what dataframe or list should be passed as the first argument.

I want to replace this code with something that uses apply:

BasicPlot(depth, var[,1], xtitle=xlab)
for(i in 2:ncol(var))
    BasicPlot(depth, var[,i], add=TRUE, xtitle=xlab, ...) 

See my DepthPlotter project on github if you want to know what I'm trying to achieve.

EDIT: after reading this site about apply functions I found a solution:

lapply(2:ncol(var), function(i) { BasicPlot(depth, var[,i], add=TRUE, xtitle=xlab, ...)})

This worked but gave me silly output, in this case a list of [[1]] NULL [[2]] NULL etc. which I was able to silence by surrounding the code with invisible(...) .

Is this in fact better than the previous code? Is it a) easier to read and b) faster?


Read multiple files in multiple folders: a double for-loop

The second problem I'm trying to tackle with an apply function in stead of the for-loops I'm currently thinking of:

I want to read multiple raster images (named 1.png through 8.png ), which are located in separate folders (named 959D22 through 959D41 ). I want to assign rownames to the list items based on the folder- and filenames. This should return a list of raster images which I can then add to my plots at specific values.

cores <- list.files("data/core_splitpics/") # folder names
pics <- list() # according to replies below, this is the sort of thing that makes for-loops bad in R, because I'm expanding a list step by step.
for(i in 1:length(cores)){ # loop over folders
    imgs  <- list.files(paste("data/core_splitpics/", cores[i]) # filenames in folder i
    for(j in 1:length(imgs)){ # loop over files
        pics[[i, j]] <- as.raster(readPNG(paste("data/core_splitpics", 
            cores[i], paste(j, ".png", sep=""), sep="/"))) # something like this
    }
}

After reading up, I still don't know what the best way to build this list is. Maybe by creating list names first and then adding the raster images to those entries? Is a variation of apply better here since I want to return a value?

For your first question, lapply add complexity as it's returning a list you're not using, the for loop is more straightforward but slower, the overhead may or may not be significant, depending on the speed of the called function.


For the second part I would go this way (dummy input for the example, so I kept the inner loop, it could be avoided if not calling scalar input function like readPNG in the inner loop):

cores <- list("A", "B", "C")
pics <- rep(list(vector("character",1)),length(cores))
for(i in 1:length(cores)) {
  imgs <- list("1","2","3")
  pics[[i]] <- vector("character",length(imgs))
  for(j in 1:length(imgs)) {
    pics[[i]][j] <- paste(cores[[i]],imgs[j],sep="/")
  }
}

This way you don't grow and copy on each iteration, but allocate once as few time as possible.

Output:

> pics
[[1]]
[1] "A/1" "A/2" "A/3"

[[2]]
[1] "B/1" "B/2" "B/3"

[[3]]
[1] "C/1" "C/2" "C/3"

For an easier access you can do names(pics) <- cores to get:

> pics
$A
[1] "A/1" "A/2" "A/3"

$B
[1] "B/1" "B/2" "B/3"

$C
[1] "C/1" "C/2" "C/3"

and so you can access each core separately with for example pics$A .

And last if you want to work over all files just unlist(pics) to get a vector of all files you can pass to a for loop or sapply or any other function taking a vector as input.

> for(p in unlist(pics)) { print(p) }
[1] "A/1"
[1] "A/2"
[1] "A/3"
[1] "B/1"
[1] "B/2"
[1] "B/3"
[1] "C/1"
[1] "C/2"
[1] "C/3"

To give an idea on the perf difference, I played a little to benchmark:

test.for <- function() {
  cores <- LETTERS[1:26]
  pics <- rep(list(vector("character",1)),length(cores))
  for(i in 1:length(cores)) {
    imgs <- 1:8
    pics[[i]] <- vector("character",length(imgs))
    for(j in 1:length(imgs)) {
      pics[[i]][j] <- paste(cores[[i]],imgs[j],sep="/")
    }
  }
  return(pics)
}

test.lapply <- function() {
  cores <- LETTERS[1:26]
  pics <- lapply( seq_along(cores), 
                   function(i) {
                     imgs <- 1:8
                     return(unlist(lapply( seq_along(imgs), 
                                          function(j) {
                                            paste( cores[[i]],
                                                   imgs[j],
                                                   sep="/"
                                                 )
                                          })
                                   )
                            )
                   })
  return(pics)
}

identical(test.for(),test.lapply())
microbenchmark(test.for(),test.lapply(),times=10L)

Results:

> identical(test.for(),test.lapply())
[1] TRUE
> microbenchmark(test.for(),test.lapply(),times=10L)
Unit: microseconds
          expr      min       lq     mean   median       uq      max neval
    test.for() 1241.166 1279.239 1392.894 1318.636 1405.375 1724.522    10
 test.lapply()  997.502 1013.393 1044.083 1024.152 1042.196 1155.090    10

The for loop is not so slower for 26 letters by 8 numbers in this use case, but maybe the lapply could be improved too.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM