简体   繁体   中英

Performance comparison between variable declaration within or outside nested loop

case I

while (!channels_.empty())
            {
                for (auto it = channels_.begin(); it != channels_.end(); ++it)
                {
                    time_t stop_time;
                    if (it->second->active_playlist()->status_stop_time(it->second->on_air_row(), stop_time))
                    {

                    }
                }
            }

case II

while (!channels_.empty())
        {
            time_t stop_time;
            for (auto it = channels_.begin(); it != channels_.end(); ++it)
            {

                if (it->second->active_playlist()->status_stop_time(it->second->on_air_row(), stop_time))
                {

                }
            }
        }

Variable stop_time is declared outside or within the nested loop in Case I and II. Which one is better in terms of performance ? Why?

The standard has very little to say about performance but there is one clear rule, that optimizations must not change observable side effects.

The standard does not describe stack or heap usage, so it would be perfectly legitimate for a compiler to allocate space for a variable on the stack at any point prior to its use.

But optimal would depend on various factors. On the most common architectures, it makes most sense to do all your stack pointer adjustments in just 2 places - entry and exit. There is no cost difference on an x86 to changing the stack pointer by 640 instead on 8.

Further, if the compiler can be sure the value doesn't change, then the optimizer may be able to hoist the assignment out of the loop too.

In practice, mainstream compilers (gcc, clang, msvc) on x86 and arm based platforms will aggregate stack allocations into single up and downs, and hoist loop invariants given sufficient optimizer settings/arguments.

If in doubt, inspect the assembly or benchmark.

We can very quickly demonstrate this with godbolt :

#include <vector>

struct Channel
{
  void test(int&);
};

std::vector<Channel> channels;

void test1()
{
  while (!channels.empty())
  {
    for (auto&& channel : channels)
    {
      int stop_time;
      channel.test(stop_time);
    }
  }
}

void test2()
{
  while (!channels.empty())
  {
    int stop_time;
    for (auto&& channel : channels)
    {
      channel.test(stop_time);
    }
  }
}

void test3()
{
  int stop_time;
  while (!channels.empty())
  {
    for (auto&& channel : channels)
    {
      channel.test(stop_time);
    }
  }
}

With GCC 5.1 and -O3 this generates three identical pieces of assembly :

    test1():
            pushq   %rbp
            pushq   %rbx
            subq    $24, %rsp
    .L8:
            movq    channels+8(%rip), %rbp
            movq    channels(%rip), %rbx
            cmpq    %rbp, %rbx
            je      .L10
    .L7:
            leaq    12(%rsp), %rsi
            movq    %rbx, %rdi
            addq    $1, %rbx
            call    Channel::test(int&)
            cmpq    %rbx, %rbp
            jne     .L7
            jmp     .L8
    .L10:
            addq    $24, %rsp
            popq    %rbx
            popq    %rbp
            ret

    test2():
            pushq   %rbp
            pushq   %rbx
            subq    $24, %rsp
    .L22:
            movq    channels+8(%rip), %rbp
            movq    channels(%rip), %rbx
            cmpq    %rbp, %rbx
            je      .L20
    .L14:
            leaq    12(%rsp), %rsi
            movq    %rbx, %rdi
            addq    $1, %rbx
            call    Channel::test(int&)
            cmpq    %rbx, %rbp
            jne     .L14
            jmp     .L22
    .L20:
            addq    $24, %rsp
            popq    %rbx
            popq    %rbp
            ret

    test3():
            pushq   %rbp
            pushq   %rbx
            subq    $24, %rsp
    .L26:
            movq    channels+8(%rip), %rbp
            movq    channels(%rip), %rbx
            cmpq    %rbp, %rbx
            je      .L28
    .L25:
            leaq    12(%rsp), %rsi
            movq    %rbx, %rdi
            addq    $1, %rbx
            call    Channel::test(int&)
            cmpq    %rbx, %rbp
            jne     .L25
            jmp     .L26
    .L28:
            addq    $24, %rsp
            popq    %rbx
            popq    %rbp
            ret

A general answer for many performance-related questions is "measure it and see for yourself". If it's easy, just do it. In this case, it is easy.

Sometimes, it's good to look at assembly code - if the code is identical in your two cases (I guess it is), you don't even need to measure its performance.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM