即使从未调用代码，添加stringstream / cout也会影响性能。

Question

I have a program in which a simple function is called a large number of times. 我有一个程序，其中一个简单的函数被多次调用。 I have added some simple logging code and find that this significantly affects performance, even when the logging code is not actually called. 我添加了一些简单的日志记录代码，发现即使未实际调用日志记录代码，这也会显着影响性能。 A complete (but simplified) test case is shown below: 完整（但简化）的测试用例如下所示：

#include <chrono>
#include <iostream>
#include <random>
#include <sstream>

using namespace std::chrono;

std::mt19937 rng;

uint32_t getValue()
{
    // Just some pointless work, helps stop this function from getting inlined.
    for (int x = 0; x < 100; x++)
    {
        rng();
    }

    // Get a value, which happens never to be zero
    uint32_t value = rng();

    // This (by chance) is never true
    if (value == 0)
    {
        value++; // This if statment won't get optimized away when printing below is commented out.

        std::stringstream ss;
        ss << "This never gets printed, but commenting out these three lines improves performance." << std::endl;
        std::cout << ss.str();
    }

    return value;
}

int main(int argc, char* argv[])
{
    // Just fror timing
    high_resolution_clock::time_point start = high_resolution_clock::now();

    uint32_t sum = 0;   
    for (uint32_t i = 0; i < 10000000; i++)
    {
        sum += getValue();  
    }

    milliseconds elapsed = duration_cast<milliseconds>(high_resolution_clock::now() - start);

    // Use (print) the sum to make sure it doesn't get optimized away.
    std::cout << "Sum  = " << sum << ", Elapsed = " << elapsed.count() << "ms" << std::endl;
    return 0;
}

Note that the code contains stringstream and cout but these are never actually called. 请注意，该代码包含stringstream和cout，但实际上从未调用过它们。 However, the presence of these three lines of code increases the run time from 2.9 to 3.3 seconds. 但是，这三行代码的存在将运行时间从2.9秒增加到3.3秒。 This is in release mode on VS2013. 在VS2013上处于释放模式。 Curiously, if I build in GCC using '-O3' flag the extra three lines of code actually decrease the runtime by half a second or so. 奇怪的是，如果我使用'-O3'标志在GCC中进行构建，那么额外的三行代码实际上会将运行时间减少了半秒左右。

I understand that the extra code could impact the resulting executable in a number of ways, such as by preventing inlining or causing more cache misses. 我知道，额外的代码可能会以多种方式影响生成的可执行文件，例如通过防止内联或导致更多的高速缓存未命中。 The real question is whether there is anything I can do to improve on this situation? 真正的问题是，在这种情况下我是否可以做些改善？ Switching to sprintf()/printf() doesn't seem to make a difference. 切换到sprintf（）/ printf（）似乎没有什么不同。 Do I need to simply accept that adding such logging code to small functions will affect performance even if not called? 我是否只需要接受这样的日志代码，即使不调用它们也会影响性能吗？

Note: For completeness, my real/full scenario is that I use a wrapper macro to throw exceptions and I like to log when such an exception is thrown. 注意：为了完整起见，我的真实/完整场景是使用包装宏来引发异常，并且我喜欢记录引发此类异常的时间。 So when I call THROW_EXCEPT(...) it inserts code similar to that shown above and then throws. 因此，当我调用THROW_EXCEPT（...）时，它会插入类似于上面显示的代码，然后抛出。 This in then hurting when I throw exceptions from inside a small function. 当我从一个小函数内部抛出异常时，这很痛苦。 Any better alternatives here? 还有更好的选择吗？

Edit: Here is a VS2013 solution for quick testing, and so compiler settings can be checked: https://drive.google.com/file/d/0B7b4UnjhhIiEamFyS0hjSnVzbGM/view?usp=sharing 编辑：这是用于快速测试的VS2013解决方案，因此可以检查编译器设置： https ://drive.google.com/file/d/0B7b4UnjhhIiEamFyS0hjSnVzbGM/view?usp=sharing

Answer 1

So I initially thought that this was due to branch prediction and optimising out branches so I took a look at the annotated assembly for when the code is commented out: 所以我最初认为这是由于分支预测和优化分支所致，因此我查看了注释代码时注释掉的程序集：

    if (value == 0)
00E21371  mov         ecx,1  
00E21376  cmove       eax,ecx  
    {
        value++;

Here we see that the compiler has helpfully optimised out our branch, so what if we put in a more complex statement to prevent it from doing so: 在这里，我们看到编译器已经对分支进行了优化，因此，如果我们输入一个更复杂的语句来阻止这样做，该怎么办：

if (value == 0)
00AE1371  jne         getValue+99h (0AE1379h)  
    {
        value /= value;
00AE1373  xor         edx,edx  
00AE1375  xor         ecx,ecx  
00AE1377  div         eax,ecx

Here the branch is left in but when running this it runs about as fast as the previous example with the following lines commented out. 此处保留了该分支，但运行此分支时，其运行速度与上一个示例大致相同，但注释了以下几行。 So lets have a look at the assembly for having those lines left in: 因此，让我们看一下将这些行保留在其中的程序集：

if (value == 0)
008F13A0  jne         getValue+20Bh (08F14EBh)  
    {
        value++;     
        std::stringstream ss;
008F13A6  lea         ecx,[ebp-58h]  
008F13A9  mov         dword ptr [ss],8F32B4h  
008F13B3  mov         dword ptr [ebp-0B0h],8F32F4h  
008F13BD  call        dword ptr ds:[8F30A4h]  
008F13C3  push        0  
008F13C5  lea         eax,[ebp-0A8h]  
008F13CB  mov         dword ptr [ebp-4],0  
008F13D2  push        eax  
008F13D3  lea         ecx,[ss]  
008F13D9  mov         dword ptr [ebp-10h],1  
008F13E0  call        dword ptr ds:[8F30A0h]  
008F13E6  mov         dword ptr [ebp-4],1  
008F13ED  mov         eax,dword ptr [ss]  
008F13F3  mov         eax,dword ptr [eax+4]  
008F13F6  mov         dword ptr ss[eax],8F32B0h  
008F1401  mov         eax,dword ptr [ss]  
008F1407  mov         ecx,dword ptr [eax+4]  
008F140A  lea         eax,[ecx-68h]  
008F140D  mov         dword ptr [ebp+ecx-0C4h],eax  
008F1414  lea         ecx,[ebp-0A8h]  
008F141A  call        dword ptr ds:[8F30B0h]  
008F1420  mov         dword ptr [ebp-4],0FFFFFFFFh

That's a lot of instructions if that branch is ever hit. 如果该分支被击中，那么会有很多指令。 So what if we try something else? 那么，如果我们尝试其他方法呢？

    if (value == 0)
011F1371  jne         getValue+0A6h (011F1386h)  
    {
        value++;
        printf("This never gets printed, but commenting out these three lines improves performance.");
011F1373  push        11F31D0h  
011F1378  call        dword ptr ds:[11F30ECh]  
011F137E  add         esp,4

Here we have far fewer instructions and once again it runs as quickly as with all lines commented out. 在这里，我们的指令要少得多，它的运行速度与注释掉所有行的速度一样快。

So I'm not sure I can say for certain exactly what is happening here but I feel at the moment it is a combination of branch prediction and CPU instruction cache misses. 因此，我不确定我可以确切地说出这里发生了什么，但我目前认为这是分支预测和CPU指令高速缓存未命中的结合。

In order to solve this problem you could move the logging into a function like so: 为了解决此问题，您可以将日志记录移入如下函数：

void log()
{
    std::stringstream ss;
    ss << "This never gets printed, but commenting out these three lines improves performance." << std::endl;
    std::cout << ss.str();
}

and 和

if (value == 0)
{
    value++;
    log();

Then it runs as fast as before with all those instructions replaced with a single call log (011C12E0h) . 然后，它以与以前一样快的速度运行，所有这些指令都替换为一个call log (011C12E0h) 。

即使从未调用代码，添加stringstream / cout也会影响性能。

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-04-15 15:32:10

即使从未调用代码，添加stringstream / cout也会影响性能。

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-04-15 15:32:10

解决方案1
1 已采纳 2015-04-15 15:32:10