[英]Multi-threaded performance std::string
We are running some code on a project that uses OpenMP and I've run into something strange. 我们在使用OpenMP的项目上运行一些代码,我遇到了一些奇怪的事情。 I've included parts of some play code that demonstrates what I see.
我已经包含了一些演示代码的部分内容,以展示我所看到的内容。
The tests compare calling a function with a const char* argument with a std::string argument in a multi-threaded loop. 测试比较在多线程循环中使用带有std :: string参数的const char *参数调用函数。 The functions essentially do nothing and so have no overhead.
这些函数基本上什么都不做,所以没有开销。
What I do see is a major difference in the time it takes to complete the loops. 我所看到的是完成循环所需时间的主要差异。 For the const char* version doing 100,000,000 iterations the code takes 0.075 seconds to complete compared with 5.08 seconds for the std::string version.
对于执行100,000,000次迭代的const char *版本,代码需要0.075秒才能完成,而std :: string版本需要5.08秒。 These tests were done on Ubuntu-10.04-x64 with gcc-4.4.
这些测试是在带有gcc-4.4的Ubuntu-10.04-x64上完成的。
My question is basically whether this is solely due the dynamic allocation of std::string and why in this case that can't be optimized away since it is const and can't change? 我的问题基本上是否这完全是由于std :: string的动态分配以及为什么在这种情况下无法优化掉,因为它是const并且不能改变?
Code below and many thanks for your responses. 以下代码,非常感谢您的回复。
Compiled with: g++ -Wall -Wextra -O3 -fopenmp string_args.cpp -o string_args 编译:g ++ -Wall -Wextra -O3 -fopenmp string_args.cpp -o string_args
#include <iostream>
#include <map>
#include <string>
#include <stdint.h>
// For wall time
#ifdef _WIN32
#include <time.h>
#else
#include <sys/time.h>
#endif
namespace
{
const int64_t g_max_iter = 100000000;
std::map<const char*, int> g_charIndex = std::map<const char*,int>();
std::map<std::string, int> g_strIndex = std::map<std::string,int>();
class Timer
{
public:
Timer()
{
#ifdef _WIN32
m_start = clock();
#else /* linux & mac */
gettimeofday(&m_start,0);
#endif
}
float elapsed()
{
#ifdef _WIN32
clock_t now = clock();
const float retval = float(now - m_start)/CLOCKS_PER_SEC;
m_start = now;
#else /* linux & mac */
timeval now;
gettimeofday(&now,0);
const float retval = float(now.tv_sec - m_start.tv_sec) + float((now.tv_usec - m_start.tv_usec)/1E6);
m_start = now;
#endif
return retval;
}
private:
// The type of this variable is different depending on the platform
#ifdef _WIN32
clock_t
#else
timeval
#endif
m_start; ///< The starting time (implementation dependent format)
};
}
bool contains_char(const char * id)
{
if( g_charIndex.empty() ) return false;
return (g_charIndex.find(id) != g_charIndex.end());
}
bool contains_str(const std::string & name)
{
if( g_strIndex.empty() ) return false;
return (g_strIndex.find(name) != g_strIndex.end());
}
void do_serial_char()
{
int found(0);
Timer clock;
for( int64_t i = 0; i < g_max_iter; ++i )
{
if( contains_char("pos") )
{
++found;
}
}
std::cout << "Loop time: " << clock.elapsed() << "\n";
++found;
}
void do_parallel_char()
{
int found(0);
Timer clock;
#pragma omp parallel for
for( int64_t i = 0; i < g_max_iter; ++i )
{
if( contains_char("pos") )
{
++found;
}
}
std::cout << "Loop time: " << clock.elapsed() << "\n";
++found;
}
void do_serial_str()
{
int found(0);
Timer clock;
for( int64_t i = 0; i < g_max_iter; ++i )
{
if( contains_str("pos") )
{
++found;
}
}
std::cout << "Loop time: " << clock.elapsed() << "\n";
++found;
}
void do_parallel_str()
{
int found(0);
Timer clock;
#pragma omp parallel for
for( int64_t i = 0; i < g_max_iter ; ++i )
{
if( contains_str("pos") )
{
++found;
}
}
std::cout << "Loop time: " << clock.elapsed() << "\n";
++found;
}
int main()
{
std::cout << "Starting single-threaded loop using std::string\n";
do_serial_str();
std::cout << "\nStarting multi-threaded loop using std::string\n";
do_parallel_str();
std::cout << "\nStarting single-threaded loop using char *\n";
do_serial_char();
std::cout << "\nStarting multi-threaded loop using const char*\n";
do_parallel_char();
}
My question is basically whether this is solely due the dynamic allocation of std::string and why in this case that can't be optimized away since it is const and can't change?
我的问题基本上是否这完全是由于std :: string的动态分配以及为什么在这种情况下无法优化掉,因为它是const并且不能改变?
Yes, it is due to the allocation and copying for std::string on every iteration. 是的,这是由于每次迭代时std :: string的分配和复制。
A sufficiently smart compiler could potentially optimize this, but it is unlikely to happen with current optimizers. 一个足够聪明的编译器可能会对此进行优化,但目前的优化器不太可能发生这种情况。 Instead, you can hoist the string yourself:
相反,你可以自己提升弦:
void do_parallel_str()
{
int found(0);
Timer clock;
std::string const str = "pos"; // you can even make it static, if desired
#pragma omp parallel for
for( int64_t i = 0; i < g_max_iter; ++i )
{
if( contains_str(str) )
{
++found;
}
}
//clock.stop(); // Or use something to that affect, so you don't include
// any of the below expression (such as outputing "Loop time: ") in the timing.
std::cout << "Loop time: " << clock.elapsed() << "\n";
++found;
}
Does changing: 改变:
if( contains_str("pos") )
to: 至:
static const std::string str = "pos";
if( str )
Change things much? 改变了很多东西? My current best guess is that the implicit constructor call for
std::string
every loop would introduce a fair bit of overhead and optimising it away whilst possible is still a sufficiently hard problem I suspect. 我目前最好的猜测是,每个循环的
std::string
隐式构造函数调用会引入相当大的开销并尽可能地优化它,但我怀疑这仍然是一个足够困难的问题。
std::string
(in your case temporary) requires dynamic allocation, which is a very slow operation, compared to everything else in your loop. std::string
(在你的情况下是临时的)需要动态分配,与循环中的其他所有内容相比,这是一个非常慢的操作。 There are also old implementations of standard library that did COW, which also slow in multi-threaded environment. COW也有标准库的旧实现,在多线程环境中也会变慢。 Having said that, there is no reason why compiler cannot optimize temporary string creation and optimize away the whole
contains_str
function call, unless you have some side effects there. 话虽如此,没有理由为什么编译器不能优化临时字符串创建并优化掉整个
contains_str
函数调用,除非你有一些副作用。 Since you didn't provide implementation for that function, it's impossible to say if it could be completely optimized away. 由于您没有为该功能提供实现,因此无法确定它是否可以完全优化。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.