简体   繁体   中英

Function, Vectors, and Loops in R

I recently began experimenting with R as a language to use for genetic programming. I have slowly but surely been learning more and more about how R works and its best coding practices. Yet, I have hit a road block. Here is my situation. I have a dataset with roughly 700 rows, each row has 400 or so columns. I have everything setup that a function with a number of parameters the same as the number of columns gets sent as a parameter into an evaluation (fitness scoring) function. I want to go row by row in the dataset and pass the values in each column in a row into the function being evaluated. The first problem was figuring out how to pass in the parameters separately into the function. By "separately" I mean that the function expects 400 parameters, not a vector of length 400. To do this I used the following:

do.call(function,as.list(parameters))

Where parameters is a vector of a month variable (1-12) that is appended to the values in a row in the dataset. This works fine, I just used a for loop to iterate over the 700 rows in the dataset and then another loop for the 12 months and use the above to accumulate a vector of outputs. The problem is this is painfully slow, around 24-28 seconds per function. And I have 100-500 functions sent into this evaluation every generation of evolution. The bottom line is this is not the way to go. Next I attempted to use the sapply method as below.

outputs <- sapply(1:12,function(m) sapply(rows[1:length(rows)],function(p) do.call(f,as.list(c(p,m)))))

This applied (1-12) as the months and then applied (1-700) as the rows of the dataset. This took just as long. Any ideas on solutions would be helpful.

The main problem in cases like this is usually that the approach you are taking is the wrong one. I don't know enough about your specific case, but:

  1. Try to vectorize the calculations - so your function should operate on ALL rows instead of just one at a time.
  2. If you just store numbers in a data.frame, converting it to a matrix will usually speed up many operations.
  3. Don't write functions that take 400 parameters! 5 is probably on the high side too.

EDIT Since you generate the function, you should be able to instead generate a different version that takes a vector of values instead of that many parameters. Note that the vector you pass it can have names:

# Convert this:
f <- function(foo, bar) {
  foo+bar
}
do.call(f, list(foo=42, bar=13))

# To this:
f <- function(args) {
  args[["foo"]] + args[["bar"]] 
  # or even faster:
  #args[[0]] + args[[1]]
  # or fastest:
  #sum(args)
}
do.call(f, list(args=c(foo=42, bar=13)))
# or, simply
f(c(foo=42, bar=13))

... calling a function with 1 parameter instead of 400 is about 60x faster ! But note that this is just the overhead of calling the function. You need to measure how much time the actual function takes too. If that takes like a second or more, then it doesn't matter how efficiently you call it or how efficient you loops are...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM