[英]Clean comparison of the performence of different approaches with a same goal in R?
In R
, how can I cleanly compare different solutions to a same problem, being "fair" between each of them? 在R
,如何清洁比较同一问题的不同解决方案,使每个解决方案之间“公平”? Could running a resource-consuming solution before others alter the performences of the latter ones? 是否可以先运行耗资源的解决方案,然后再改变后者的性能? How could one 'clean' the state of the machine between each test? 在每次测试之间如何“清洁”机器的状态?
Suppose I want to compute the mean over columns of a matrix, I could do it the easy or the complicated way: 假设我要计算矩阵各列的均值,可以通过简单或复杂的方式来完成:
set.seed(9)
N = 1e7
ncol = 1e3
myT = matrix(runif(N), ncol = ncol)
func1 <- function(mat) {
colMeans(mat)
}
func2 <- function(mat) {
apply(mat, 2, function(x) sum(x)/length(x))
}
func3 <- function(mat) {
nrows = c()
for (i in 1:nrow(mat)) {
nrows = c(nrows, 1) # yes, this is very stupid ;-)
}
colSums(mat) / sum(nrows)
}
system.time( replicate(1, t1 <- func1(myT)))
# user system elapsed
# 0.012 0.000 0.011
system.time( replicate(1, t2 <- func2(myT)))
# user system elapsed
# 0.136 0.036 0.170
system.time( replicate(1, t3 <- func3(myT)))
# user system elapsed
# 0.140 0.032 0.170
Running several times the system.time()
execution can give different results for a same test (possibly altering the conclusions). 运行几次system.time()
执行可以为同一测试提供不同的结果(可能会更改结论)。 I noticed it was especially the case for more complicated, resource-sonsuming solutions, while the cleanest ones tend to have a more consistent execution time - what is the reason for this? 我注意到,对于更复杂的,占用资源的解决方案尤其如此,而最干净的解决方案往往具有更一致的执行时间-这是什么原因? How to avoid big changes between executions of the same expression, and how to prevent them to interfere with each other? 如何避免在同一表达式的执行之间发生重大变化,以及如何防止它们相互干扰?
Is a call to gc()
between tests useful, and is it enough? 在测试之间调用gc()
是否有用,是否足够?
I also know about the microbenchmark
package, but I am looking for something more 'manual' in order to understand what happens. 我也了解微microbenchmark
包,但是我正在寻找更“手动”的内容,以了解会发生什么。
I am working with RStudio
, in case it matters... 我正在与RStudio
,以防万一。
The microbenchmark
was design for this. microbenchmark
是为此设计的。 system.time()
is not as detailed system.time()
不太详细
set.seed(9)
N = 1e5
ncol = 1e3
myT = matrix(runif(N), ncol = ncol)
library(microbenchmark)
microbenchmark(
colmeans = colMeans(myT),
wrong_apply = apply(myT, 2, function(x) sum(x)/length(x)), # wrong in case of NA
correct_apply = apply(myT, 2, mean, na.rm = TRUE), # correct in case of NA
stupid = {
nrows = c()
for (i in 1:nrow(myT)) {
nrows = c(nrows, 1) # yes, this is very stupid ;-)
}
colSums(myT) / sum(nrows)
}
)
Output 输出量
Unit: microseconds
expr min lq mean median uq max neval cld
colmeans 87.235 92.367 96.44175 95.787 98.781 129.142 100 a
wrong_apply 3004.886 3071.595 3483.02090 3166.739 3267.445 18707.947 100 b
correct_apply 7595.387 7895.148 8850.87886 8106.179 8461.745 13928.438 100 c
stupid 144.109 156.510 166.15237 163.351 171.690 255.290 100 a
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.