简体   繁体   中英

What does std::vector look like in memory?

I read that std::vector should be contiguous. My understanding is, that its elements should be stored together, not spread out across the memory. I have simply accepted the fact and used this knowledge when for example using its data() method to get the underlying contiguous piece of memory.

However, I came across a situation, where the vector's memory behaves in a strange way:

std::vector<int> numbers;
std::vector<int*> ptr_numbers;
for (int i = 0; i < 8; i++) {
    numbers.push_back(i);
    ptr_numbers.push_back(&numbers.back());
}

I expected this to give me a vector of some numbers and a vector of pointers to these numbers. However, when listing the contents of the ptr_numbers pointers, there are different and seemingly random numbers, as though I am accessing wrong parts of memory.

I have tried to check the contents every step:

for (int i = 0; i < 8; i++) {
    numbers.push_back(i);
    ptr_numbers.push_back(&numbers.back());
    for (auto ptr_number : ptr_numbers)
       std::cout << *ptr_number << std::endl;
    std::cout << std::endl;
}

The result looks roughly like this:

1

some random number
2

some random number
some random number
3

So it seems as though when I push_back() to the numbers vector, its older elements change their location.

So what does it exactly mean, that std::vector is a contiguous container and why do its elements move? Does it maybe store them together, but moves them all together, when more space is needed?

Edit: Is std::vector contiguous only since C++17? (Just to keep the comments on my previous claim relevant to future readers.)

It roughly looks like this (excuse my MS Paint masterpiece):

矢量内存布局

The std::vector instance you have on the stack is a small object containing a pointer to a heap-allocated buffer, plus some extra variables to keep track of the size and and capacity of the vector.


So it seems as though when I push_back() to the numbers vector, its older elements change their location.

The heap-allocated buffer has a fixed capacity. When you reach the end of the buffer, a new buffer will be allocated somewhere else on the heap and all the previous elements will be moved into the new one. Their addresses will therefore change.


Does it maybe store them together, but moves them all together, when more space is needed?

Roughly, yes. Iterator and address stability of elements is guaranteed with std::vector only if no reallocation takes place.


I am aware, that std::vector is a contiguous container only since C++17

The memory layout of std::vector hasn't changed since its first appearance in the Standard. ContiguousContainer is just a "concept" that was added to differentiate contiguous containers from others at compile-time.

The Answer

It's a single contiguous storage (a 1d array). Each time it runs out of capacity it gets reallocated and stored objects are moved to the new larger place — this is why you observe addresses of the stored objects changing.

It has always been this way, not since C++17 .

TL; DR

The storage is growing geometrically to ensure the requirement of the amortized O(1) push_back() . The growth factor is 2 ( ) in most implementations of the C++ Standard Library ( GCC , Clang , STLPort ) and 1.5 ( ) in the MSVC variant. ),在 C++ 标准库( GCCClangSTLPort )中为 1.5( MSVC变体。

不断增长的 std::vector

If you pre-allocate it with vector::reserve(N) and sufficiently large N , then addresses of the stored objects won't be changing when you add new ones.

In most practical applications is usually worth pre-allocating it to at least 32 elements to skip the first few reallocations shortly following one other (0→1→2→4→8→16).

It is also sometimes practical to slow it down, switch to the arithmetic growth policy ( ), or stop entirely after some reasonably large size to ensure the application does not waste or grow out of memory. ),或者在某个合理的大尺寸后完全停止以确保应用程序不会浪费或耗尽内存。

Lastly, in some practical applications, like column-based object storages, it may be worth giving up the idea of contiguous storage completely in favor of a segmented one (same as what std::deque does but with much larger chunks). This way the data may be stored reasonably well localized for both per-column and per-row queries (though this may need some help from the memory allocator as well).

std::vector being a contiguous container means exactly what you think it means.

However, many operations on a vector can re-locate that entire piece of memory.

One common case is when you add element to it, the vector must grow, it can re-allocate and copy all elements to another contiguous piece of memory.

So what does it exactly mean, that std::vector is a contiguous container and why do its elements move? Does it maybe store them together, but moves them all together, when more space is needed?

That's exactly how it works and why appending elements does indeed invalidate all iterators as well as memory locations when a reallocation takes place¹. This is not only valid since C++17, it has been the case ever since.

There are a couple of benefits from this approach:

  • It is very cache-friendly and hence efficient.
  • The data() method can be used to pass the underlying raw memory to APIs that work with raw pointers.
  • The cost of allocating new memory upon push_back , reserve or resize boil down to constant time, as the geometric growth amortizes over time (each time push_back is called the capacity is doubled in libc++ and libstdc++, and approx. growths by a factor of 1.5 in MSVC).
  • It allows for the most restricted iterator category, ie, random access iterators, because classical pointer arithmetic works out well when the data is contiguously stored.
  • Move construction of a vector instance from another one is very cheap.

These implications can be considered the downside of such a memory layout:

  • All iterators and pointers to elements are invalidate upon modifications of the vector that imply a reallocation. This can lead to subtle bugs when eg erasing elements while iterating over the elements of a vector.
  • Operations like push_front (as std::list or std::deque provide) aren't provided ( insert(vec.begin(), element) works, but is possibly expensive¹), as well as efficient merging/splicing of multiple vector instances.

¹ Thanks to @FrançoisAndrieux for pointing that out.

In terms of the actual structure, an std::vector looks something like this in memory:

struct vector {    // Simple C struct as example (T is the type supplied by the template)
  T *begin;        // vector::begin() probably returns this value
  T *end;          // vector::end() probably returns this value
  T *end_capacity; // First non-valid address
  // Allocator state might be stored here (most allocators are stateless)
};

Relevant code snippet from the libc++ implementation as used by LLVM

Printing the raw memory contents of an std::vector :
(Don't do this if you don't know what you're doing!)

#include <iostream>
#include <vector>

struct vector {
    int *begin;
    int *end;
    int *end_capacity;
};

int main() {
    union vecunion {
        std::vector<int> stdvec;
        vector           myvec;
        ~vecunion() { /* do nothing */ }
    } vec = { std::vector<int>() };
    union veciterator {
        std::vector<int>::iterator stditer;
        int                       *myiter;
        ~veciterator() { /* do nothing */ }
    };

    vec.stdvec.push_back(1); // Add something so we don't have an empty vector

    std::cout
      << "vec.begin          = " << vec.myvec.begin << "\n"
      << "vec.end            = " << vec.myvec.end << "\n"
      << "vec.end_capacity   = " << vec.myvec.end_capacity << "\n"
      << "vec's size         = " << vec.myvec.end - vec.myvec.begin << "\n"
      << "vec's capacity     = " << vec.myvec.end_capacity - vec.myvec.begin << "\n"
      << "vector::begin()    = " << (veciterator { vec.stdvec.begin() }).myiter << "\n"
      << "vector::end()      = " << (veciterator { vec.stdvec.end()   }).myiter << "\n"
      << "vector::size()     = " << vec.stdvec.size() << "\n"
      << "vector::capacity() = " << vec.stdvec.capacity() << "\n"
      ;
}

If you try coding it this way, you will see that the values remain the same and address for each value in the vector has a difference of 4 with its adjacent element (interesting).

std::vector<int> numbers;
std::vector<int*> ptr_numbers;

// adding values 0 up to 8 in the vector called numbers
for (int i = 0; i < 8; i++) {
    numbers.push_back(i);

}

// printing out the values inside vector numbers 
//and storing the address of each element of vector called numbers inside the ptr_numbers.
for (int i = 0; i != numbers.size(); i++) {
    cout << numbers[i] << endl;
    ptr_numbers.push_back(&numbers[i]);
}
cout << "" << endl;

// printing out the values of each element of vector ptr_numbers
for (int y = 0; y != ptr_numbers.size(); y++) {
    cout << *ptr_numbers[y] << endl;
}

// printing out the address of each element of vector ptr_numbers
for (int y = 0; y != ptr_numbers.size(); y++) {
    cout << &ptr_numbers[y] << endl;
}

When you loop through both vectors. They will output the same values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM