What is a good workflow for detecting performance regressions in R packages? Ideally, I'm looking for something that integrates with R CMD check
that alerts me when I have introduced a significant performance regression in my code.
What is a good workflow in general? What other languages provide good tools? Is it something that can be built on top unit testing, or that is usually done separately?
This is a very challenging question, and one that I'm frequently dealing with, as I swap out different code in a package to speed things up. Sometimes a performance regression comes along with a change in algorithms or implementation, but it may also arise due to changes in the data structures used.
What is a good workflow for detecting performance regressions in R packages?
In my case, I tend to have very specific use cases that I'm trying to speed up, with different fixed data sets. As Spacedman wrote, it's important to have a fixed computing system, but that's almost infeasible: sometimes a shared computer may have other processes that slow things down 10-20%, even when it looks quite idle.
My steps:
Examine the profiling output as vigorously as possible (see this question for some insights, also referencing tools from the OP).
Ideally, I'm looking for something that integrates with R CMD check that alerts me when I have introduced a significant performance regression in my code.
Unfortunately, I don't have an answer for this.
What is a good workflow in general?
For me, it's quite similar to general dynamic code testing: is the output (execution time in this case) reproducible, optimal, and transparent? Transparency comes from understanding what affects the overall time. This is where Mike Dunlavey's suggestions are important, but I prefer to go further, with a line profiler.
Regarding a line profiler, see my previous question , which refers to options in Python and Matlab for other examples. It's most important to examine clock time, but also very important to track memory allocation, number of times the line is executed, and call stack depth.
What other languages provide good tools?
Almost all other languages have better tools. :) Interpreted languages like Python and Matlab have the good & possibly familiar examples of tools that can be adapted for this purpose. Although dynamic analysis is very important, static analysis can help identify where there may be some serious problems. Matlab has a great static analyzer that can report when objects (eg vectors, matrices) are growing inside of loops, for instance. It is terrible to find this only via dynamic analysis - you've already wasted execution time to discover something like this, and it's not always discernible if your execution context is pretty simple (eg just a few iterations, or small objects).
As far as language-agnostic methods, you can look at:
Is it something that can be built on top unit testing, or that is usually done separately?
This is hard to answer. For static analysis, it can occur before unit testing. For dynamic analysis, one may want to add more tests. Think of it as sequential design (ie from an experimental design framework): if the execution costs appear to be, within some statistical allowances for variation, the same, then no further tests are needed. If, however, method B seems to have an average execution cost greater than method A, then one should perform more intensive tests.
Update 1: If I may be so bold, there's another question that I'd recommend including, which is: "What are some gotchas in comparing the execution time of two versions of a package?" This is analogous to assuming that two programs that implement the same algorithm should have the same intermediate objects. That's not exactly true (see this question - not that I'm promoting my own questions, here - it's just hard work to make things better and faster...leading to multiple SO questions on this topic :)). In a similar way, two executions of the same code can differ in time consumed due to factors other than the implementation.
So, some gotchas that can occur, either within the same language or across languages, within the same execution instance or across "identical" instances, which can affect runtime:
The key result is: Ideally, how should we test for differences in expected values, subject to the randomness created due to order effects? Well, pretty simple: go back to experimental design. :)
When the empirical differences in execution times are different from the "expected" differences, it's great to have enabled additional system and execution monitoring so that we don't have to re-run the experiments until we're blue in the face.
The only way to do anything here is to make some assumptions. So let us assume an unchanged machine, or else require a 'recalibration'.
Then use a unit-test alike framework, and treat 'has to be done in X units of time' as just yet another testing criterion to be fulfilled. In other words, do something like
stopifnot( timingOf( someExpression ) < savedValue plus fudge)
so we would have to associate prior timings with given expressions. Equality-testing comparisons from any one of the three existing unit testing packages could be used as well.
Nothing that Hadley couldn't handle so I think we can almost expect a new package timr
after the next long academic break :). Of course, this has to be either be optional because on a "unknown" machine (think: CRAN testing the package) we have no reference point, or else the fudge factor has to "go to 11" to automatically accept on a new machine.
A recent change announced on the R-devel feed could give a crude measure for this.
CHANGES IN R-devel UTILITIES
'R CMD check' can optionally report timings on various parts of the check: this is controlled by environment variables documented in 'Writing R Extensions'.
See http://developer.r-project.org/blosxom.cgi/R-devel/2011/12/13#n2011-12-13
The overall time spent running the tests could be checked and compared to previous values. Of course, adding new tests will increase the time, but dramatic performance regressions could still be seen, albeit manually.
This is not as fine grained as timing support within individual test suites, but it also does not depend on any one specific test suite.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.