[英]Adding stringstream/cout hurts performance, even when the code is never called
I have a program in which a simple function is called a large number of times. 我有一个程序,其中一个简单的函数被多次调用。 I have added some simple logging code and find that this significantly affects performance, even when the logging code is not actually called. 我添加了一些简单的日志记录代码,发现即使未实际调用日志记录代码,这也会显着影响性能。 A complete (but simplified) test case is shown below: 完整(但简化)的测试用例如下所示:
#include <chrono>
#include <iostream>
#include <random>
#include <sstream>
using namespace std::chrono;
std::mt19937 rng;
uint32_t getValue()
{
// Just some pointless work, helps stop this function from getting inlined.
for (int x = 0; x < 100; x++)
{
rng();
}
// Get a value, which happens never to be zero
uint32_t value = rng();
// This (by chance) is never true
if (value == 0)
{
value++; // This if statment won't get optimized away when printing below is commented out.
std::stringstream ss;
ss << "This never gets printed, but commenting out these three lines improves performance." << std::endl;
std::cout << ss.str();
}
return value;
}
int main(int argc, char* argv[])
{
// Just fror timing
high_resolution_clock::time_point start = high_resolution_clock::now();
uint32_t sum = 0;
for (uint32_t i = 0; i < 10000000; i++)
{
sum += getValue();
}
milliseconds elapsed = duration_cast<milliseconds>(high_resolution_clock::now() - start);
// Use (print) the sum to make sure it doesn't get optimized away.
std::cout << "Sum = " << sum << ", Elapsed = " << elapsed.count() << "ms" << std::endl;
return 0;
}
Note that the code contains stringstream and cout but these are never actually called. 请注意,该代码包含stringstream和cout,但实际上从未调用过它们。 However, the presence of these three lines of code increases the run time from 2.9 to 3.3 seconds. 但是,这三行代码的存在将运行时间从2.9秒增加到3.3秒。 This is in release mode on VS2013. 在VS2013上处于释放模式。 Curiously, if I build in GCC using '-O3' flag the extra three lines of code actually decrease the runtime by half a second or so. 奇怪的是,如果我使用'-O3'标志在GCC中进行构建,那么额外的三行代码实际上会将运行时间减少了半秒左右。
I understand that the extra code could impact the resulting executable in a number of ways, such as by preventing inlining or causing more cache misses. 我知道,额外的代码可能会以多种方式影响生成的可执行文件,例如通过防止内联或导致更多的高速缓存未命中。 The real question is whether there is anything I can do to improve on this situation? 真正的问题是,在这种情况下我是否可以做些改善? Switching to sprintf()/printf() doesn't seem to make a difference. 切换到sprintf()/ printf()似乎没有什么不同。 Do I need to simply accept that adding such logging code to small functions will affect performance even if not called? 我是否只需要接受这样的日志代码,即使不调用它们也会影响性能吗?
Note: For completeness, my real/full scenario is that I use a wrapper macro to throw exceptions and I like to log when such an exception is thrown. 注意:为了完整起见,我的真实/完整场景是使用包装宏来引发异常,并且我喜欢记录引发此类异常的时间。 So when I call THROW_EXCEPT(...) it inserts code similar to that shown above and then throws. 因此,当我调用THROW_EXCEPT(...)时,它会插入类似于上面显示的代码,然后抛出。 This in then hurting when I throw exceptions from inside a small function. 当我从一个小函数内部抛出异常时,这很痛苦。 Any better alternatives here? 还有更好的选择吗?
Edit: Here is a VS2013 solution for quick testing, and so compiler settings can be checked: https://drive.google.com/file/d/0B7b4UnjhhIiEamFyS0hjSnVzbGM/view?usp=sharing 编辑:这是用于快速测试的VS2013解决方案,因此可以检查编译器设置: https ://drive.google.com/file/d/0B7b4UnjhhIiEamFyS0hjSnVzbGM/view?usp=sharing
So I initially thought that this was due to branch prediction and optimising out branches so I took a look at the annotated assembly for when the code is commented out: 所以我最初认为这是由于分支预测和优化分支所致,因此我查看了注释代码时注释掉的程序集:
if (value == 0)
00E21371 mov ecx,1
00E21376 cmove eax,ecx
{
value++;
Here we see that the compiler has helpfully optimised out our branch, so what if we put in a more complex statement to prevent it from doing so: 在这里,我们看到编译器已经对分支进行了优化,因此,如果我们输入一个更复杂的语句来阻止这样做,该怎么办:
if (value == 0)
00AE1371 jne getValue+99h (0AE1379h)
{
value /= value;
00AE1373 xor edx,edx
00AE1375 xor ecx,ecx
00AE1377 div eax,ecx
Here the branch is left in but when running this it runs about as fast as the previous example with the following lines commented out. 此处保留了该分支,但运行此分支时,其运行速度与上一个示例大致相同,但注释了以下几行。 So lets have a look at the assembly for having those lines left in: 因此,让我们看一下将这些行保留在其中的程序集:
if (value == 0)
008F13A0 jne getValue+20Bh (08F14EBh)
{
value++;
std::stringstream ss;
008F13A6 lea ecx,[ebp-58h]
008F13A9 mov dword ptr [ss],8F32B4h
008F13B3 mov dword ptr [ebp-0B0h],8F32F4h
008F13BD call dword ptr ds:[8F30A4h]
008F13C3 push 0
008F13C5 lea eax,[ebp-0A8h]
008F13CB mov dword ptr [ebp-4],0
008F13D2 push eax
008F13D3 lea ecx,[ss]
008F13D9 mov dword ptr [ebp-10h],1
008F13E0 call dword ptr ds:[8F30A0h]
008F13E6 mov dword ptr [ebp-4],1
008F13ED mov eax,dword ptr [ss]
008F13F3 mov eax,dword ptr [eax+4]
008F13F6 mov dword ptr ss[eax],8F32B0h
008F1401 mov eax,dword ptr [ss]
008F1407 mov ecx,dword ptr [eax+4]
008F140A lea eax,[ecx-68h]
008F140D mov dword ptr [ebp+ecx-0C4h],eax
008F1414 lea ecx,[ebp-0A8h]
008F141A call dword ptr ds:[8F30B0h]
008F1420 mov dword ptr [ebp-4],0FFFFFFFFh
That's a lot of instructions if that branch is ever hit. 如果该分支被击中,那么会有很多指令。 So what if we try something else? 那么,如果我们尝试其他方法呢?
if (value == 0)
011F1371 jne getValue+0A6h (011F1386h)
{
value++;
printf("This never gets printed, but commenting out these three lines improves performance.");
011F1373 push 11F31D0h
011F1378 call dword ptr ds:[11F30ECh]
011F137E add esp,4
Here we have far fewer instructions and once again it runs as quickly as with all lines commented out. 在这里,我们的指令要少得多,它的运行速度与注释掉所有行的速度一样快。
So I'm not sure I can say for certain exactly what is happening here but I feel at the moment it is a combination of branch prediction and CPU instruction cache misses. 因此,我不确定我可以确切地说出这里发生了什么,但我目前认为这是分支预测和CPU指令高速缓存未命中的结合。
In order to solve this problem you could move the logging into a function like so: 为了解决此问题,您可以将日志记录移入如下函数:
void log()
{
std::stringstream ss;
ss << "This never gets printed, but commenting out these three lines improves performance." << std::endl;
std::cout << ss.str();
}
and 和
if (value == 0)
{
value++;
log();
Then it runs as fast as before with all those instructions replaced with a single call log (011C12E0h)
. 然后,它以与以前一样快的速度运行,所有这些指令都替换为一个call log (011C12E0h)
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.