简体   繁体   English

用户空间内存碎片

[英]User space memory fragmentation

Let's suppose a very basic c++ program which allocates a huge number of small std::vector's. 让我们假设一个非常基本的c ++程序,该程序分配了大量的小std :: vector。 I don't really know how the compiler and the OS will place those vectors in process memory space but if the number is big, I think some vectors could be close by (near). 我真的不知道编译器和OS如何将这些向量放置在进程内存空间中,但是如果数量很大,我认为某些向量可能会在附近。

Now, let's suppose I delete some vectors in memory and I keep a few others. 现在,让我们假设我删除了内存中的一些矢量,而另一些矢量。 Imagine I want to add 10000 items to the first vector. 假设我要向第一个向量添加10000个项目。

What will happen if the second vector is too close in memory? 如果第二个向量在内存中太近会发生什么? Do you think i will get a "low memory" error, or should the OS move the first vector? 您认为我会遇到“内存不足”错误,还是操作系统应该移动第一个向量?

No, it does not matter if vectors are close to each other. 不,矢量是否彼此靠近并不重要。 Only if a vector reaches a size where no contiguous block of memory is left to hold its memory, you will get an error (for the default allocator, an std::bad_alloc exception will be thrown). 仅当向量达到没有剩余连续内存块来保留其内存的大小时,您才会得到错误(对于默认分配器,将引发std::bad_alloc异常)。

What happens internally is similar to what you probably mean with moving , but in C++11 that term has a different meaning, so I will try to avoid that and would rather call it reallocated. 内部发生的事情与移动可能意味着的相似,但是在C ++ 11中,该术语具有不同的含义,因此我将尽量避免这种情况,而是将其称为“重新分配”。 Also note that the operating system has nothing to do with it. 另请注意,操作系统与此无关。

Let us look at the details: 让我们看一下细节:

It is correct that an std::vector is contiguous, but (in contrast to an std::array ) its elements are not directly stored inside the std::vector instance itself. 正确的是std::vector是连续的,但是(与std::array )其元素没有直接存储在std::vector实例本身内部。 Instead, it stores the underlying array on the heap and only keeps a pointer to it. 相反,它将底层数组存储在堆上,并且仅保留指向它的指针。

For efficiency reasons, implementations are allowed to make its internal array bigger than the number of elements stored in the array. 出于效率原因,允许实现使其内部数组大于数组中存储的元素数量。 For example: 例如:

std::vector<int> v;
assert(v.size() == 0);
assert(v.capacity() >= 0); // at least v.size(), but can be higher

When you add new elements to the vector (eg, via v.push_back ), one the following two things will happen: 当您向向量添加新元素时(例如,通过v.push_back ),将发生以下两种情况:

  • If there is enough space left (ie, v.size() < v.capacity() ), the new element can be added without any extra memory allocation 如果有足够的空间(即v.size() < v.capacity() ),则可以添加新元素而无需任何额外的内存分配
  • Otherwise, the underlying array has to be increased, which involves the following steps: 否则,必须增加基础数组,这涉及以下步骤:

    1. A new a new (larger) array will be allocated. 将分配一个新的(更大)数组。
    2. All elements from the old array has to be copied to the new array. 旧数组中的所有元素都必须复制到新数组中。
    3. The new array replaces the old array (which will be deallocated) and you can insert the new element. 新数组将替换旧数组(将被释放),您可以插入新元素。

It is important to note that the std::vector instance itself will stay at the same memory address, only its internal pointer will now point to the newly created (larger) array. 重要的是要注意, std::vector实例本身将保留在相同的内存地址,现在仅其内部指针将指向新创建的(更大)数组。 In that respect, the data has been moved to a different memory location. 在这方面,数据已移至其他存储位置。 (That also has consequences, for instance, all iterations that you kept to the elements are now invalidated.) (这也会产生后果,例如,保留在元素上的所有迭代现在都无效。)

The critical operation is the reallocation of the memory. 关键操作是内存的重新分配。 Here, memory fragmentation comes into play. 在这里,内存碎片发挥了作用。 It can happen that because of fragmentation, there is not possible to allocate the new array even if there would be enough spaces without fragmentation. 可能发生由于碎片而导致的问题,即使有足够的空间而不碎片,也无法分配新数组。

In contrast to other languages, it is not possible in C++ to avoid the fragmentation in the way a compacting garbage collector will do (eg, some GC implementations in Java are compacting). 与其他语言相反,在C ++中,不可能像压缩垃圾收集器那样避免碎片化(例如,Java中的某些GC实现正在压缩)。 In the same way, the operating system cannot help to avoid memory fragmentation in C++. 同样,操作系统无法帮助避免C ++中的内存碎片。 A least in theory. 至少在理论上。 In practice, in today's 64-bit systems (with virtual memory), memory fragmentation is less of a concern that it used to be. 实际上,在当今的64位系统(带有虚拟内存)中,内存碎片已不再像以前那样令人担忧。

If you do not need the property that the elements in your container have to be contiguous, you can use std::dequeue instead of std::vector . 如果不需要容器中的元素必须连续的属性,则可以使用std::dequeue而不是std::vector It is more robust against memory fragmentation because it will not keep one big array but several smaller blocks. 它对内存碎片更为健壮,因为它不会保留一个大数组,而是保留几个较小的块。 On the other hand, std::vector is typically more efficient, so I would by default still use the vector, but here is an old article from Herb Sutter that touches the topic: Using Vector and Deque 另一方面, std::vector通常更有效,因此默认情况下我仍将使用此向量,但这是Herb Sutter的一篇旧文章,涉及主题: 使用Vector和Deque

When your std::vector runs out of capacity, it'll reallocate space (usually 2 * required_size , see amortized complexity) and move elements already in the vector. 当您的std::vector容量用完时,它将重新分配空间(通常为2 * required_size ,请参阅摊余的复杂性)并移动向量中已有的元素。 It will move the data pointer inside the first vector, it won't move the vector itself (your vector and your vector data is in different locations). 它将数据指针移动到第一个矢量内,它不会移动矢量本身(您的矢量和您的矢量数据位于不同的位置)。

Your std::vector and the elements "inside" it are normally not in the same spot. 您的std::vector及其“内部”元素通常不在同一位置。 This incomplete pseudo-implementation is wrong for a number of reasons but might illustrate how push_back scales internally: 由于许多原因,这种不完整的伪实现是错误的,但可能说明了push_back内部如何扩展:

namespace std {

template<typename T>
class vector<T>
  size_t size_;
  size_t capacity_;
  T* data_;  // Stored elsewhere on the heap.
  void push_back(const T& foo) {
    if (size_ == capacity_) {
      capacity_ *= 2;  // assuming capacity_ > 0, and non-wrapping size
      data_ = realloc(data_, capacity_ * sizeof(T));  // assumes POD types and no realloc failures.
    }
    data_[++size_] = foo;
  }
}
}

realloc here will move the data inside the vector, so any old references to &vector[0] are garbage after push_back reallocates the vector. 这里的realloc将把数据移动到向量中,因此在push_back重新分配向量之后,对&vector[0]所有旧引用都是垃圾。 realloc takes care of finding a continuous segment that's large enough to store N new elements (might have to mmap more memory). realloc负责查找连续的段,该段的大小足以存储N个新元素(可能必须mmap更多的内存)。

Another example that explains the separation: 另一个说明分离的示例:

int main() {
  std::vector<float> numbers;  // the vector is on the stack and never moves.

  numbers.push_back(5.0f);
  // 5.0f is stored inside vector data, which may be on the heap. 
  // Adding more items may allocate heap memory and move all previous items.

  return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM