简体   繁体   English

的std ::矢量 <std::vector<T> &gt; vs std :: vector <T*>

[英]std::vector<std::vector<T> > vs std::vector<T*>

Given that memory overhead is critical to my application, I think that of the two options above, the latter would be more light weight. 鉴于内存开销对于我的应用程序至关重要,因此我认为上述两个选项中,后一个选项的重量更轻。 Am I correct? 我对么? I am basing this on the fact that the vector has a memory overhead of 4 pointers to keep track of begin() , end() , size() and allocator. 我基于此事实,即向量具有4个指针的内存开销,以跟踪begin()end()size()和分配器。 So the total size for the whole model would be in the order of 因此,整个模型的总大小约为

(4*sizeof(T*) + sizeof(T)*Ni)*No + 3*sizeof(std::vector<T>*)

Here, I am assuming Ni , No to be the number of elements in the inner and outer vectors, resply. 在这里,我假设NiNo是内部和外部向量中元素的数量。 By using the latter expression, I am hoping to save the 4*sizeof(T*)*No since in my application, No is huge, while Ni <<<< No . 通过使用后一个表达式,我希望保存4*sizeof(T*)*No因为在我的应用程序中, No是巨大的,而Ni <<<< No Just to fix ideas, No is in the order of a 100 million and more, Ni is typically in the order 3 to 50 . 仅为了解决问题, No的数量级在100亿以上, Ni通常在350

Thanks in advance for your interest and any ideas. 预先感谢您的关注和任何想法。

NOTE: I understand and am more than happy to pay the price of dealing with the pointer incl. 注意:我理解并且非常乐意支付处理包含指针的价格。 allocating, traversing, and deallocating it, and I can do so without any significant performance overhead. 分配,遍历和取消分配它,我可以这样做,而不会产生任何显着的性能开销。

It's actually 4, you missed the allocator. 实际上是4,您错过了分配器。 See What is the overhead cost of an empty vector? 请参见空向量的间接费用是多少?

Depends on your application. 取决于您的应用程序。 Do you never append to the internal vectors? 您从不附加内部向量吗? Do they all have the same number of elements? 它们都具有相同数量的元素吗? Is the average size of the data stored in the internal vectors small? 内部向量中存储的数据的平均大小是否较小?

If you answered yes to all the questions above than maybe T* is an acceptable solution. 如果您对以上所有问题的回答都是肯定的,那么T*也许是可以接受的解决方案。 If not think about how would you handle that issue without the support of vector. 如果不考虑,在没有向量支持的情况下您将如何处理该问题。 It might be easier to just take the hit on memory. 仅凭空击中内存可能会更容易。

As you see here , the exact overhead of an std::vector is implementation dependent. 正如你看到这里 ,一个确切的开销std::vector实现有关。

Also note that if No is very large, it's very probable that your data will be stored in chunks in some implementations, in which case, you also have the overhead which is of the order of the number of chunks. 还要注意,如果“ No非常大,则很有可能在某些实现中您的数据将以块的形式存储,在这种情况下,您的开销也大约是块的数量。

But in general I agree that the pointer implementation is cheaper space-wise. 但总的来说,我同意指针的实现在空间上更便宜。

I think that [the vector<T*> would be better. 我认为[ vector<T*>会更好。 Am I correct? 我对么?

It would be smaller, but it wouldn't necessarily be "better". 它会更小,但不一定会“更好”。 The change would saddle you with the necessity to allocate and free inner arrays. 所做的更改使您难以分配和释放内部数组。 You would no longer have a way of knowing the size of the inner array. 您将不再具有知道内部数组大小的方法。

Also note that some overhead on size would remain: as long as your inner arrays are allocated individually, there would be some additional storage reserved by the allocator in addition to the size of requested chunk, to let the deallocation routines know the size of the chunk. 还要注意,还会保留一些大小上的开销:只要分别分配内部数组,分配器除了请求的块的大小外还会保留一些额外的存储,以使释放例程知道块的大小。 。

If your memory requirements are so tight, consider allocating one vector for the whole array, and then parcel out the individual chunks into a vector of pointers. 如果您的内存需求太紧,请考虑为整个数组分配一个向量,然后将各个块打包成一个指针向量。 This would eliminate the per-chunk overhead of allocating the inner arrays indivudually. 这将消除单独分配内部数组的每个块的开销。

If you are concerned about the overhead of a vector, you should also be concerned about the overhead of malloc() / new : typical memory allocator overhead is at least two more pointers per memory region, that brings the overhead of a small vector<> up to five pointers ( sizeof(vector<int>) == 3*sizeof(void*) on linux). 如果您关注向量的开销,则还应该关注malloc() / new开销:典型的内存分配器开销是每个内存区域至少两个指针,这带来了一个小vector<>的开销。最多五个指针(在Linux上为sizeof(vector<int>) == 3*sizeof(void*) )。

So, what I would do, is to ask myself whether the size of the inner arrays needs to change once they have been initialized. 因此,我要做的是问自己,一旦初始化内部数组的大小是否需要更改。 If it is possible to avoid later reallocation of those arrays, I would allocate one huge chunk of memory, which I can then distribute to the different inner arrays, storing only their location: 如果可以避免以后重新分配这些数组,我将分配一个巨大的内存块,然后可以将其分配到不同的内部数组,仅存储它们的位置:

int** pointerArray = new int*[innerArrayCount + 1];
int* store = new int[totalSizeOfInnerArrays];
for(int* nextArray = store, i = 0; i <= innerArrayCount; i++) {
    pointerArray[i] = nextArray;
    nextArray += innerArraySize[i];
}

The size of an array can then be deduced from the difference of the next pointer and its own: 然后,可以从下一个指针与其自身的指针之差推导出数组的大小:

for(int i = 0; i < innerArrayCount; i++) {
    int* curArray = pointerArray[i];
    size_t curSize = pointerArray[i + 1] - pointerArray[i];
    //Do whatever you like with curArray.
}

Or, you can directly use that end pointer for iterating over the inner arrays: 或者,您可以直接使用该结束指针遍历内部数组:

for(int i = 0; i < innerArrayCount; i++) {
    for(int* iterator = pointerArray[i]; iterator < pointerArray[i + 1]; iterator++) {
        //Do whatever you like with *iterator.
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM