简体   繁体   中英

Performance issue for vector::size() in a loop in C++

In the following code:

std::vector<int> var;
for (int i = 0; i < var.size(); i++);

Is the size() member function called for each loop iteration, or only once?

In theory , it is called each time, since a for loop:

for(initialization; condition; increment)
    body;

is expanded to something like

{
    initialization;
    while(condition)
    {
        body;
        increment;
    }
}

(notice the curly braces, because initialization is already in an inner scope)

In practice , if the compiler understands that a piece of your condition is invariant through all the duration of the loop and it does not have side-effects , it can be smart enough to move it out. This is routinely done with strlen and things like that (that the compiler knows well) in loops where its argument isn't written.

However it must be noted that this last condition isn't always trivial to prove; in general, it's easy if the container is local to the function and is never passed to external functions; if the container is not local (eg it's passed by reference - even if it's const ) and the loop body contains calls to other functions, the compiler often has to assume that such functions may alter it, thus blocking the hoisting of the length calculation.

Doing that optimization by hand is worthy if you know that a part of your condition is "expensive" to evaluate (and such condition usually isn't, since it usually boils down to a pointer subtraction, which is almost surely inlined).


as others said, in general with containers it's better to use iterators, but for vector s it's not so important, because random access to elements via operator[] is guaranteed to be O(1);正如其他人所说,通常使用容器最好使用迭代器,但对于vector s 不是那么重要,因为通过operator[]对元素的随机访问保证是 O(1); actually with vectors it usually is a pointer sum (vector base+index) and dereference vs the pointer increment (preceding element+1) and dereference of iterators. Since the target address is still the same, I don't think that you can gain something from iterators in terms of cache locality (and even if so, if you're not walking big arrays in tight loops you shouldn't even notice such kind of improvements).

For lists and other containers, instead, using iterators instead of random access can be really important, since using random access may mean walk every time the list, while incrementing an iterator is just a pointer dereference.

It's 'called' each time, but I put called into quotes because it really probably is just an inline method call, so you don't have to worry about its performance.

Why not use vector<int>::iterator instead?

The size() member function is called each time, but it would be a really bad implementation that wouldn't inline it, and a strange one where it wouldn't be a simple access of a fixed datum or a subtraction of two pointers.
Anyway, you shouldn't worry yourself with such trivialities until you have profiled your application and found out that this is a bottleneck.

However, what you should pay attention to is:

  1. The correct type for a vector's index is std::vector<T>::size_type .
  2. There are types (some iterators, for example) where i++ might be slower than ++i .

Therefore, the loop should be:

for(vector<int>::size_type i=0; i<var.size(); ++i)
  ...

The problem with your question is that it does not make any sense. A C++ compiler translates some source code into a binary program. The requirement is that the resulting program must preserve observable effects of the code according to the rules of the C++ Standard. This code:

for (int i = 0; i < var.size(); i++); 

simply does not have any observable effect. Moreover, it does not interact with the surrounding code any way, and the compiler may optimize it completely away; that is to generate no corresponding assembly.

To make your question meaningful, you need to specify what happens inside the loop . The problem with

for (int i = 0; i < var.size(); i++) { ... }

is that the answer very much depends on what ... actually is. I believe @MatteoItalia provided a very nice answer, just would add a description of some experiments I made. Consider the following code:

int g(std::vector<int>&, size_t);

int f(std::vector<int>& v) {
   int res = 0;
   for (size_t i = 0; i < v.size(); i++)
      res += g(v, i);
   return res;
}

First, even if calling var.size() will almost 100% sure be inlined with enabled optimizations, and this inlining typically translates into a subtraction of two pointers, this still brings into the loop some overhead. If a compiler is not able to prove that the vector size is preserved (which, generally, is very difficult or even infeasible, such as in our case), then you will end up with unnecessary load and sub (and, possibly, shift ) instructions. The generated assembly of the loop with GCC 9.2, -O3 , and x64 was:

.L3:
    mov     rsi, rbx
    mov     rdi, rbp
    add     rbx, 1
    call    g(std::vector<int, std::allocator<int> >&, unsigned long)
    add     r12d, eax
    mov     rax, QWORD PTR [rbp+8] // loads a pointer
    sub     rax, QWORD PTR [rbp+0] // subtracts another poniter
    sar     rax, 2                 // result * sizeof(int) => size()
    cmp     rbx, rax
    jb      .L3

If we rewrite the code as follows:

int g(std::vector<int>&, size_t);

int f(std::vector<int>& v) {
   int res = 0;
   for (size_t i = 0, e = v.size(); i < e; i++)
      res += g(v, i);
   return res;
}

then, the generated assembly is simpler (and, therefore, faster):

.L3:
    mov     rsi, rbx
    mov     rdi, r13
    add     rbx, 1
    call    g(std::vector<int, std::allocator<int> >&, unsigned long)
    add     r12d, eax
    cmp     rbx, rbp
    jne     .L3

The value of the vector's size is simply kept in a register ( rbp ).

I even tried a different version where the vector is marked as being const :

int g(const std::vector<int>&, size_t);

int f(const std::vector<int>& v) {
   int res = 0;
   for (size_t i = 0; i < v.size(); i++)
      res += g(v, i);
   return res;
}

Surprisingly, even when v.size() cannot change here, the generated assembly was the same as in the first case (with additional mov , sub , and sar instructions).

Live demo is here .

Additionally, when I changed the loop into:

for (size_t i = 0; i < v.size(); i++)
   res += v[i];

then, there was no evaluation of v.size() (subtraction of pointers) within the loop on an assembly level. GCC was able to "see" here, that the body of the loop does not alter the size any way.

It must be called everytime because size() might return a different value everytime.

Therefore there's no big choice it simply must be.

As other have said

  • the semantics must be as if it were called each time
  • it is probably inlined, and is probably a simple function

on top of which

  • a smart enough optimizer may be able to deduce that it is a loop invariant with no side effects and elide it entirely (this is easier if the code is inlined, but may be possible even if it is not if the compiler does global optimization)

I think that if the compiler can conclusively deduce that the variable var is not modified inside the "loop body"

for(int i=0; i< var.size();i++) { 
    // loop body
}

then the above may be transposed to something equivalent of

const size_t var_size = var.size();
for( int i = 0; i < var_size; i++ ) { 
    // loop body
}

However, I am not absolutely sure, so comments are welcome :)

Also,

  • In most situations, the size() member function is inlined, so the issue does not warrant worrying

  • The concern is perhaps equally applicable to the end() which is always used for iterator based looping, ie it != container.end()

  • Please consider using size_t or vector<int>::size_type for the type of i [See Steve Jessop's comment below.]

But it could be done in this way (providing that this loop intends to only read/write without actually changing the size of a vector):

for(vector<int>::size_type i=0, size = var.size(); i < size; ++i) 
{
//do something
}

In the loop above you have just one call to size independently from size being inlined or not.

as others said, The compiler shall decide what to do with the actual code written. The key figure is that it is called each time. But if you want to get a performance boost, it is best to write your code with some considerations. Your case is one of them, there are others as well, like the difference between these two pieces of code:

for (int i = 0 ; i < n ; ++i)
{
   for ( int j = 0 ; j < n ; ++j)
       printf("%d ", arr[i][j]);
   printf("\n");
}
for (int j = 0 ; j < n ; ++j)
{
   for ( int i = 0 ; i < n ; ++i)
       printf("%d ", arr[i][j]);
   printf("\n");
}

The difference is that the first one will not change the ram page too much per references, but the other will exhaust your cache and TLB and other stuff.

Also inline won't help that much! because the order of the calling function will remain as n(size of the vector) times. It helps in some places though, but the best thing is to rewrite your code.

But! if you want to let a compiler do it's optimizations over your code NEVER put volatile, like so:

for(volatile int i = 0 ; i < 100; ++i)

It prevents the compiler from optimizing. If you need another hint for performance use register instead of volatile.

for(register int i = 0 ; i < 100; ++i)

The compiler will try not to move i from the CPU-registers to RAM. It is not ensured that it can do it, but it will do it's best ;)

Tested it for 900k iterations Its taking time 43 seconds for pre-calculated size and 42 seconds for using the size() call.

If you guaranteed vector size doesn't change in the loop, better to use pre-calculated size otherwise there is no choice and must use size().

#include <iostream>
#include <vector>

using namespace std;

int main() {
vector<int> v;

for (int i = 0; i < 30000; i++)
        v.push_back(i);

const size_t v_size = v.size();
for(int i = 0; i < v_size; i++)
        for(int j = 0; j < v_size; j++)
                cout << "";

//for(int i = 0; i < v.size(); i++)
//      for(int j = 0; j < v.size(); j++)
//              cout << "";
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM