使用太多内存的嵌套STL向量

Question

I have an STL vector My_Partition_Vector of Partition objects, defined as 我有一个STL向量My_Partition_Vector的Partition对象，定义为

struct Partition // the event log data structure
{
    int key;
    std::vector<std::vector<char> > partitions;
    float modularity;
};

The actual nested structure of Partition.partitions varies from object to object but in the total number of chars stored in Partition.partitions is always 16. Partition.partitions的实际嵌套结构因对象而异，但Partition.partitions中存储的字符总数始终为16。

I assumed therefore that the total size of the object should be more or less 24 bytes (16 + 4 + 4). 因此我假设对象的总大小应该大于或等于24字节（16 + 4 + 4）。 However for every 100,000 items I add to My_Partition_Vector , memory consumption (found using ps -aux) increases by around 20 MB indicating around 209 bytes for each Partition Object. 但是，对于我添加到My_Partition_Vector每100,000个项目，内存消耗（使用ps -aux找到）增加了大约20 MB，表示每个分区对象大约209个字节。

This is a nearly 9 Fold increase!? 这是近9倍的增长！？ Where is all this extra memory usage coming from? 所有这些额外的内存使用来自何处？ Some kind of padding in the STL vector, or the struct? STL向量中的某种填充，还是结构？ How can I resolve this (and stop it reaching into swap)? 我该如何解决这个问题（并阻止它进入交换）？

Answer 1

For one thing std::vector models a dynamic array so if you know that you'll always have 16 chars in partitions using std::vector is overkill. 有一件事std::vector模拟一个动态数组，所以如果你知道你使用std::vector在partitions总是有16个字符是过度的。 Use a good old C style array/matrix, boost::array or boost::multi_array . 使用一个好的旧C风格数组/矩阵， boost :: array或boost :: multi_array 。

To reduce the number of re-allocations needed for inserting/adding elements due to it's memory layout constrains std::vector is allowed to preallocate memory for a certain number of elements upfront (and it's capacity() member function will tell you how much). 为了减少插入/添加元素所需的重新分配数量，因为它的内存布局约束std::vector允许预先为一定数量的元素预分配内存（并且它的capacity()成员函数会告诉你多少）。

Answer 2

While I think he may be overstating the situation just a tad, I'm in general agreement with DeadMG's conclusion that what you're doing is asking for trouble. 虽然我认为他可能会夸大局势，但我总体上同意DeadMG的结论，即你所做的就是在寻找麻烦。

Although I'm generally the one looking at (whatever mess somebody has made) and saying "don't do that, just use a vector", this case might well be an exception. 虽然我一般都是那个看着（无论有人制造什么样的混乱）并且说“不要那样做，只是使用矢量”，但这种情况可能是一个例外。 You're creating a huge number of objects that should be tiny. 你正在创造大量应该很小的物体。 Unfortunately, a vector typically looks something like this: 不幸的是，矢量通常看起来像这样：

template <class T>
class vector { 
    T *data;
    size_t allocated;
    size_t valid;
public:
    // ...
};

On a typical 32-bit machine, that's twelve bytes already. 在典型的32位机器上，已经是12个字节。 Since you're using a vector<vector<char> > , you're going to have 12 bytes for the outer vector, plus twelve more for each vector it holds. 由于你正在使用vector<vector<char> > ，你将为外部向量提供12个字节，并为它所拥有的每个向量提供12个字节。 Then, when you actually store any data in your vectors, each of those needs to allocate a block of memory from the free store. 然后，当您实际在向量中存储任何数据时，每个数据都需要从免费存储中分配一块内存。 Depending on how your free store is implemented, you'll typically have a minimum block size -- frequently 32 or even 64 bytes. 根据您的免费存储的实现方式，您通常具有最小块大小 - 通常为32或甚至64字节。 Worse, the heap typically has some overhead of its own, so it'll add some more memory onto each block, for its own book-keeping (eg, it might use a linked list of blocks, adding another pointer worth of data to each allocation). 更糟糕的是，堆通常有一些自己的开销，因此它会为每个块添加更多的内存，用于自己的簿记（例如，它可能使用一个链接的块列表，为每个块添加另一个指针值的数据分配）。

Just for grins, let's assume you average four vectors of four bytes apiece, and that your heap manager has a 32-byte minimum block size and one extra pointer (or int) for its bookkeeping (giving a real minimum of 36 bytes per block). 只是为了咧嘴笑，让我们假设你平均每个四个字节的四个向量，并且你的堆管理器有一个32字节的最小块大小和一个额外的指针（或int）用于其簿记（每个块给出一个真正的最小36字节）。 Multiplying that out, I get 204 bytes apiece -- close enough to your 209 to believe that's reasonably close to what you're dealing with. 相乘，我得到每个204字节 - 足够接近你的209，相信它与你正在处理的相当接近。

The question at that point is how to deal with the problem. 那时的问题是如何处理这个问题。 One possibility is to try to work behind the scenes. 一种可能性是尝试在幕后工作。 All the containers in the standard library use allocators to get their memory. 标准库中的所有容器都使用分配器来获取内存。 While they default allocator gets memory directly from the free store, you can substitute a different one if you choose. 虽然默认分配器直接从免费商店获取内存，但如果您选择，可以替换其他内存。 If you do some looking around, you can find any number of alternative allocators, many/most of which are to help with exactly the situation you're in -- reducing wasted memory when allocating lots of small objects. 如果你做一些环顾四周，你可以找到任意数量的替代分配器，其中很多/大部分是为了帮助你完全处理你所处的情况 - 减少分配大量小对象时浪费的内存。 A couple to look at would be the Boost Pool Allocator and the Loki small object allocator. 要看的几个是Boost Pool Allocator和Loki小对象分配器。

Another possibility (that can be combined with the first) would be to quit using a vector<vector<char> > at all, and replace it with something like: 另一种可能性（可以与第一种结合使用）将是使用vector<vector<char> >完全退出，并将其替换为：

char partitions[16];
struct parts { 
    int part0 : 4;
    int part1 : 4;
    int part2 : 4;
    int part3 : 4;
    int part4 : 4;
    int part5 : 4;
    int part6 : 4
    int part7 : 4;
};

For the moment, I'm assuming a maximum of 8 partitions -- if it could be 16, you can add more to parts . 目前，我假设最多有8个分区 - 如果它可以是16分，则可以为parts添加更多分区。 This should probably reduce memory usage quite a bit more, but (as-is) will affect your other code. 这可能会减少内存使用量，但是（按原样）会影响您的其他代码。 You could also wrap this up into a small class of its own that provides 2D-style addressing to minimize impact on the rest of your code. 您还可以将其包装到自己的小类中，提供2D样式的寻址，以最大限度地减少对其余代码的影响。

Answer 3

If you store a near constant amount of objects, then I suggest to use a 2-dimensional array. 如果你存储一个接近恒定数量的对象，那么我建议使用一个二维数组。

The most likely reason for the memory consumption is debug data. 内存消耗的最可能原因是调试数据。 STL implementations usually store A LOT of debug data. STL实现通常存储的调试数据的很多。 Never profile an application with debug flags on. 切勿使用调试标志配置应用程序。

Answer 4

On my system, sizeof(vector) is 24. This probably corresponds to 3 8-byte members: capacity, size, and pointer. 在我的系统上，sizeof（向量）是24.这可能对应于3个8字节成员：容量，大小和指针。 Additionally, you need to consider the actual allocations which would be between 1 and 16 bytes (plus allocation overhead) for the inner vector and between 24 and 384 bytes for the outer vector ( sizeof(vector) * partitions.capacity() ). 此外，您需要考虑内部向量的1到16个字节（加上分配开销）之间的实际分配，以及外部向量（sizeof（vector）* partitions.capacity（））的24到384个字节之间的实际分配。

I wrote a program to sum this up... 我写了一个程序来总结一下......

   for ( int Y=1; Y<=16; Y++ )
      {

      const int X = 16/Y;
      if ( X*Y != 16 ) continue; // ignore imperfect geometries

      Partition a;
      a.partitions = vector< vector<char> >( Y, vector<char>(X) );

      int sum = sizeof(a); // main structure
      sum += sizeof(vector<char>) * a.partitions.capacity(); // outer vector
      for ( int i=0; i<(int)a.partitions.size(); i++ )
         sum += sizeof(char) * a.partitions[i].capacity(); // inner vector

      cerr <<"X="<<X<<", Y="<<Y<<", size = "<<sum<<"\n";

      }

The results show how much memory (not including allocation overhead) is need for each simple geometry... 结果显示每个简单几何体需要多少内存（不包括分配开销）......

X=16, Y=1, size = 80
X=8, Y=2, size = 104
X=4, Y=4, size = 152
X=2, Y=8, size = 248
X=1, Y=16, size = 440

Look at the how the "sum" is calculated to see what all of the components are. 看看如何计算“总和”以查看所有组件是什么。

The results posted are based on my 64-bit architecture. 发布的结果基于我的64位架构。 If you have a 32-bit architecture the sizes would be almost half as much -- but still a lot more than what you had expected. 如果你有一个32位架构，其大小几乎是你预期的一半 - 但仍然比你预期的要多得多。

In conclusion, std::vector<> is not very space efficient for doing a whole bunch of very small allocations. 总之，std :: vector <>对于进行一大堆非常小的分配来说空间效率不高。 If your application is required to be efficient, then you should use a different container. 如果您的应用程序要求高效，那么您应该使用不同的容器。

My approach to solving this would probably be to allocate the 16 chars with 我解决这个问题的方法可能是分配16个字符

std::tr1::array<char,16>

and wrap that with a custom class that maps 2D coordinates onto the array allocation. 并使用将2D坐标映射到数组分配的自定义类来包装它。

Below is a very crude way of doing this, just as an example to get you started. 下面是一个非常粗略的方法，这是一个让你入门的例子。 You would have to change this to meet your specific needs -- especially the ability to specify the geometry dynamically. 您必须更改它以满足您的特定需求 - 尤其是动态指定几何体的能力。

   template< typename T, int YSIZE, int XSIZE >
   class array_2D
      {
      std::tr1::array<char,YSIZE*XSIZE> data;
   public:
      T & operator () ( int y, int x ) { return data[y*XSIZE+x]; } // preferred accessor (avoid pointers)
      T * operator [] ( int index ) { return &data[index*XSIZE]; } // alternative accessor (mimics boost::multi_array syntax)
      };

Answer 5

...This is a bit of a side conversation, but boost::multi_array was suggested as an alternative to the OP's use of nested vectors. ...这是一个侧面对话，但建议使用boost :: multi_array作为OP使用嵌套向量的替代方法。 My finding was that multi_array was using a similar amount of memory when applied to the OP's operating parameters. 我的发现是multi_array在应用于OP的操作参数时使用了相似的内存量。

I derived this code from the example at Boost.MultiArray . 我从Boost.MultiArray的示例中获取了此代码。 On my machine, this showed multi_array using about 10x more memory than ideally required assuming that the 16 bytes are arranged in a simple rectangular geometry. 在我的机器上，这表明multi_array使用的内存比理想情况下多10倍，假设16个字节以简单的矩形几何排列。

To evaluate the memory usage, I checked the system monitor while the program was running and I compiled with 为了评估内存使用情况，我在程序运行时检查了系统监视器并编译了

( export CXXFLAGS="-Wall -DNDEBUG -O3" ; make main && ./main )

Here's the code... 这是代码......

   #include <iostream>
   #include <vector>
   #include "boost/multi_array.hpp"
   #include <tr1/array>
   #include <cassert>

   #define USE_CUSTOM_ARRAY 0 // compare memory usage of my custom array vs. boost::multi_array

   using std::cerr;
   using std::vector;

  #ifdef USE_CUSTOM_ARRAY
   template< typename T, int YSIZE, int XSIZE >
   class array_2D
      {
      std::tr1::array<char,YSIZE*XSIZE> data;
   public:
      T & operator () ( int y, int x ) { return data[y*XSIZE+x]; } // preferred accessor (avoid pointers)
      T * operator [] ( int index ) { return &data[index*XSIZE]; } // alternative accessor (mimics boost::multi_array syntax)
      };
  #endif

int main ()
   {

   int COUNT = 1024*1024;

  #if USE_CUSTOM_ARRAY
   vector< array_2D<char,4,4> > A( COUNT );
   typedef int index;
  #else
   typedef boost::multi_array<char,2> array_type;
   typedef array_type::index index;
   vector<array_type> A( COUNT, array_type(boost::extents[4][4]) );
  #endif

  // Assign values to the elements
  int values = 0;
  for ( int n=0; n<COUNT; n++ )
     for(index i = 0; i != 4; ++i) 
       for(index j = 0; j != 4; ++j)
           A[n][i][j] = values++;

// Verify values
   int verify = 0;
    for ( int n=0; n<COUNT; n++ )
       for(index i = 0; i != 4; ++i) 
          for(index j = 0; j != 4; ++j)
             {
             assert( A[n][i][j] == (char)((verify++)&0xFF) );
            #if USE_CUSTOM_ARRAY
             assert( A[n][i][j] == A[n](i,j) ); // testing accessors
            #endif
             }

   cerr <<"spinning...\n";
   while ( 1 ) {} // wait here (so you can check memory usage in the system monitor)

   return 0;
   }

Answer 6

16 bytes is a complete and total waste. 16个字节是一个完整和完全浪费。 You're storing a hell of a lot of data about very small objects. 你存储了很多关于非常小的物体的数据。 A vector of vector is the wrong solution to use. 向量向量是使用的错误解决方案。 You should log sizeof(vector) - it's not insignificant, as it performs a substantial function. 你应该记录sizeof（向量） - 它并不是无关紧要的，因为它执行一个重要的功能。 On my compiler, sizeof(vector) is 20. So each Partition is 4 + 4 + 16 + 20 + 20*number of inner partitions + memory overheads like the vectors not being the perfect size. 在我的编译器中，sizeof（向量）是20.因此每个分区是4 + 4 + 16 + 20 + 20 *内部分区的数量+内存开销，例如向量不是完美的大小。

You're only storing 16 bytes of data, and wasting ridiculous amounts of memory allocating them in the most segregated, highest overhead way you could possibly think of. 你只存储了16个字节的数据，并且浪费了大量的内存，以你可能想到的最分离，最高开销的方式分配它们。 The vector doesn't use a lot of memory - you have a terrible design. 向量不会占用大量内存 - 你的设计很糟糕。

使用太多内存的嵌套STL向量

问题描述

6 个解决方案

解决方案1
3 2010-08-21 19:44:45

解决方案2
2 2010-08-21 20:07:57

解决方案3
1 2010-08-21 19:40:26

解决方案4
1 2010-08-21 19:56:50

解决方案5
1 2010-08-21 22:47:26

解决方案6
0 2010-08-21 19:14:07

使用太多内存的嵌套STL向量

问题描述

6 个解决方案

解决方案1 3 2010-08-21 19:44:45

解决方案2 2 2010-08-21 20:07:57

解决方案3 1 2010-08-21 19:40:26

解决方案4 1 2010-08-21 19:56:50

解决方案5 1 2010-08-21 22:47:26

解决方案6 0 2010-08-21 19:14:07

解决方案1
3 2010-08-21 19:44:45

解决方案2
2 2010-08-21 20:07:57

解决方案3
1 2010-08-21 19:40:26

解决方案4
1 2010-08-21 19:56:50

解决方案5
1 2010-08-21 22:47:26

解决方案6
0 2010-08-21 19:14:07