简体   繁体   中英

More efficient function and for loop

I am trying to do a more efficient for loop. I know the existence of sapply, laaply, etc. but I don't know how to implement it in my code.

I have my function which I don't know if it is very efficient. I think I should improve this but I don't know how.

myfun <- function(a, b, c) {
  sum <- 0
  iter <- 0
  while (sum < c) {
    nr <- runif(1, a, b)
    sum <- sum + nr
    iter <- iter + 1
  }
  return(iter)
}

And here is the part which I would like to use an sapply or something similar.

a <- 0
b <- 1
c <- 2
x <- 0
for (i in 1:10^9) {
  x <- x + myfun(a, b, c)
}

Also, I need to make a sapply similar to this

sapply(1:10^9, functie(a ,b ,c)) 

But the sapply uses 1:10^9 as parameters, instead of a, b, c.

You're doing it right, in my honest opinion. Since you don't need to return a vectorized or multi-dimensional result but instead update an existing object at each iteration, the for loop you're suggesting is more than adequate.

If you want to take a look at some great discussion about this topic I suggest you to look at this link: https://r4ds.had.co.nz/iteration.html

Edit: just to address the speed argument

start <- Sys.time()
purrr::map_dbl(1:1000, function(x) y + myfun(a, b, c)) %>% sum
end <- Sys.time()
end - start

# Time difference of 0.02593184 secs

start <- Sys.time()
y <- replicate(1000, myfun(a,b,c))
cumsum(y)[1000]
end <- Sys.time()
end - start

# Time difference of 0.01755929 secs

y <- 0
start <- Sys.time()
for(i in 1:1000){
  y<- y + myfun(a,b,c)
}
end <- Sys.time()
end - start

# Time difference of 0.01459098 secs

I think replicate() is what you may be looking for (I changed your n to something smaller).

set.seed(1234)

n <- 10^2

y <- replicate(n, myfun(a,b,c))
sum(y)
# [1] 462

This matches your prior result.

set.seed(1234)

a <-0
b <-1
c <-2
x <-0
for (i in 1:n){
  x <- x + myfun(a,b,c)
}

x
# [1] 462

I would probably solve this using purrr::map() . Eg like this:

c(1:1e9) %>% 
  purrr::map_dbl(
    ~ myfun(a, b, c)
  ) %>% 
  sum()

This first calls myfun() the same number of times as the length of c(1:1e9) , and stores the results in a numeric vector, then it uses sum() to add the results together.

My tests shows it's a bit faster than using replicate() .

Here are some options

  • A base R recursion method
f_TIC <- function(x, y, z) ifelse(z <= 0, 0, f_TIC(x, y, z - runif(1, x, y)) + 1)
  • Rcpp implementation of f_TIC
library(Rcpp)
cppFunction("
int f_TIC_cpp(double x, double y, double z) {
  if (z <= 0) {
    return 0;
  } else {
    return f_TIC_cpp(x, y, z- R::runif(0,1))+1;
  }
}
")

Benchmarking

library(Rcpp)

f <- function(s = 0) {
  if (s[length(s)] >= 2) {
    return(length(s) - 1L)
  } else {
    f(c(s, s[length(s)] + runif(1, 0L, 1L)))
  }
}

f_TIC <- function(x, y, z) ifelse(z <= 0, 0, f_TIC(x, y, z - runif(1, x, y)) + 1)


cppFunction("
double myfun_cpp() {
  double s = 0;
  int i = 0;
  while (s < 2) {
    s = s + R::runif(0, 1);
    i++;
  }
  return i;
}
")

cppFunction("
int f_TIC_cpp(double x, double y, double z) {
  if (z <= 0) {
    return 0;
  } else {
    return f_TIC_cpp(x, y, z- R::runif(0,1))+1;
  }
}
")

myfun <- function(a, b, c) {
  sum <- 0
  iter <- 0
  while (sum < c) {
    nr <- runif(1, a, b)
    sum <- sum + nr
    iter <- iter + 1
  }
  return(iter)
}

set.seed(42)
R <- 1e3
microbenchmark::microbenchmark(
  f = replicate(R, f()),
  f_TIC = replicate(R, f_TIC(0, 1, 2)),
  f_TIC_cpp = replicate(R, f_TIC_cpp(0,1,2)),
  myfun_cpp = replicate(R, myfun_cpp()),
  myfun = replicate(R, myfun(0, 1, 2)), 
  times = 1e2L,
  control = list(warmup = 1e1L)
)

and we will see

Unit: milliseconds
      expr     min       lq      mean   median       uq     max neval
         f 11.9342 12.50330 14.161982 13.02100 14.96575 22.7116   100
     f_TIC 20.1925 21.69420 23.678240 22.28255 24.86350 34.1577   100
 f_TIC_cpp  2.0293  2.10080  2.639625  2.17505  2.36190  7.9715   100
 myfun_cpp  1.7351  1.79415  2.094577  1.83810  2.00495  6.7481   100
     myfun  9.1408  9.45240 11.783504 10.32355 14.68815 19.5400   100

Here is a recursive function f() that does the same job as myfun() .

f <- function(s=0) {
  if (s[length(s)] >= 2) {
    return(length(s) - 1L)
  } else {
    f(c(s, s[length(s)] + runif(1, 0L, 1L)))
  }
}

set.seed(42)

f()
# [1] 3

replicate(8, f())
# [1] 4 5 4 4 3 5 3 5

stopifnot(all.equal({set.seed(42);f()}, {set.seed(42);myfun(0, 1, 2)}))

However (and most likely for that reason), it's just cooler, not faster:

# Unit: milliseconds
#  expr      min       lq     mean   median       uq      max neval cld
#     f 21.57227 22.01614 23.61562 22.30010 26.18903 28.19850   100   b
# myfun 16.20270 16.52542 17.76446 16.70385 19.44336 22.15172   100  a 

set.seed(42); R <- 1e3
microbenchmark::microbenchmark(
  f=replicate(R, f()), myfun=replicate(R, myfun(0, 1, 2)), times=1e2L,
  control=list(warmup=1e1L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM