I am currently trying to benchmark various implementations of large loop performing arbitrary jobs, and I found myself with a very slow version when using boost transform iterators and boost counting_iterators.
I designed a small code that benchmark two loops that sums the product of all integers between 0 and SIZE-1 with an arbitrary integer (that I choose to be 1 in my example in order to avoid overflow).
Her's my code:
//STL
#include <iostream>
#include <algorithm>
#include <functional>
#include <chrono>
//Boost
#include <boost/iterator/transform_iterator.hpp>
#include <boost/iterator/counting_iterator.hpp>
//Compile using
// g++ ./main.cpp -o test -std=c++11
//Launch using
// ./test 1
#define NRUN 10
#define SIZE 128*1024*1024
struct MultiplyByN
{
MultiplyByN( size_t N ): m_N(N){};
size_t operator()(int i) const { return i*m_N; }
const size_t m_N;
};
int main(int argc, char* argv[] )
{
int N = std::stoi( argv[1] );
size_t sum = 0;
//Initialize chrono helpers
auto start = std::chrono::steady_clock::now();
auto stop = std::chrono::steady_clock::now();
auto diff = stop - start;
double msec=std::numeric_limits<double>::max(); //Set min runtime to ridiculously high value
MultiplyByN op(N);
//Perform multiple run in order to get minimal runtime
for(int k = 0; k< NRUN; k++)
{
sum = 0;
start = std::chrono::steady_clock::now();
for(int i=0;i<SIZE;i++)
{
sum += op(i);
}
stop = std::chrono::steady_clock::now();
diff = stop - start;
//Compute minimum runtime
msec = std::min( msec, std::chrono::duration<double, std::milli>(diff).count() );
}
std::cout << "First version : Sum of values is "<< sum << std::endl;
std::cout << "First version : Minimal Runtime was "<< msec << " msec "<< std::endl;
msec=std::numeric_limits<double>::max(); //Reset min runtime to ridiculously high value
//Perform multiple run in order to get minimal runtime
for(int k = 0; k< NRUN; k++)
{
start = std::chrono::steady_clock::now();
//Functional way to express the summation
sum = std::accumulate( boost::make_transform_iterator(boost::make_counting_iterator(0), op ),
boost::make_transform_iterator(boost::make_counting_iterator(SIZE), op ),
(size_t)0, std::plus<size_t>() );
stop = std::chrono::steady_clock::now();
diff = stop - start;
//Compute minimum runtime
msec = std::min( msec, std::chrono::duration<double, std::milli>(diff).count() );
}
std::cout << "Second version : Sum of values is "<< sum << std::endl;
std::cout << "Second version version : Minimal Runtime was "<< msec << " msec "<< std::endl;
return EXIT_SUCCESS;
}
And the output I get:
./test 1
First version : Sum of values is 9007199187632128
First version : Minimal Runtime was 433.142 msec
Second version : Sum of values is 9007199187632128
Second version version : Minimal Runtime was 10910.7 msec
The "functional" version of my loop that uses std::accumulate is 25 times slower than the simple loop version, why so ?
Thank you in advance for your help
Based on your comment in the code, you've compiled this with
g++ ./main.cpp -o test -std=c++11
Since you didn't specify the optimization level, g++ used the default setting, which is -O0
ie no optimization.
That means that the compiler didn't inline anything. Template libraries like the standard library or boost depend on inlining for performance. Additionally, the compiler will produce a lot of extra code, that's far from optimal -- it doesn't make any sense to make performance comparisons on such binaries.
Recompile with optimization enabled, and try your test again to get meaningful results.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.