简体繁体 English

std :: vector <A>vs std :: vector</a> <A*> <A>CPU的差异</a>

[英]std::vector<A> vs std::vector<A*> difference for CPU

原文 2014-05-04 17:00:56 8 3 c++/ performance/ cpu/ ram

Lets discuss a case when I have a huge std::vector. 让我们讨论一个当我有一个巨大的std :: vector的情况。 I need to iterate on all elements and call print function. 我需要遍历所有元素并调用print函数。 There are two cases. 有两种情况。 If I store my objects in the vector, and the objects will be next to each other in memory, or I allocate my object is the heap, and store the pointers of the objects in the vector. 如果我将对象存储在向量中，并且这些对象在内存中彼此相邻，或者我分配的对象是堆，则将对象的指针存储在向量中。 In this case the objects will be distributed in all over the RAM. 在这种情况下，对象将分布在整个RAM中。

In case copies of the objects are stored in std::vector<A> , when CPU brings data from RAM to CPU cache then it brings a chunk of memory, which contains multiple elements of the vector. 如果对象的副本存储在std::vector<A> ，则当CPU将数据从RAM传输到CPU高速缓存时，它将带来一块内存，其中包含矢量的多个元素。 In this case when you iterate on each element and call a function, then you know that multiple elements will be processed and only then the CPU will go to RAM to request the remaining part of data to process. 在这种情况下，当您迭代每个元素并调用一个函数时，便知道将要处理多个元素，只有这样，CPU才会进入RAM以请求处理剩余的数据部分。 And this is good because CPU does not have a lot of free cycles. 这很好，因为CPU没有很多空闲周期。

What about the case of the std::vector<A*> ? 那么std::vector<A*>的情况呢？ When it brings a chunk of pointers is it easy for CPU to obtain objects by pointer? 当它带来大量的指针时，CPU是否容易通过指针获取对象？ Or it should request from RAM the objects on which you call some functions and there will be cache misses and free CPU cycles? 还是应该从RAM请求在其上调用某些函数的对象，并且会出现高速缓存未命中和空闲的CPU周期？ Is it bad compared with the case above in the aspect of performance? 与上述情况相比，在性能方面是否不好？

3 个解决方案

At least in a typical case, when the CPU fetches a pointer (or a number of pointers) from memory, it will not automatically fetch the data to which those pointers refer. 至少在通常情况下，当CPU从内存中获取一个指针（或多个指针）时，它不会自动获取这些指针所引用的数据。

So, in the case of the vector of pointers, when you load the item that each of those pointers refers to, you'll typically get a cache miss, and access will be substantially slower than if they were stored contiguously. 因此，在使用指针向量的情况下，当加载每个指针所指的项目时，通常会遇到高速缓存未命中的情况，并且访问速度比连续存储时要慢得多。 This is particularly true when/if each item is relatively small, so a number of them could fit in a single cache line (for some level of cache--keep in mind that a current processor will often have two or three levels of cache, each of which might have a different line size). 当/如果每个项目都相对较小，则尤其如此，因此它们中的许多项可以容纳在单个缓存行中（对于某种级别的缓存，请记住，当前处理器通常将具有两或三个级别的缓存，每个都可能具有不同的行大小）。

It may, however, be possible to mitigate this to some degree. 但是，可以在某种程度上减轻这种情况。 You can overload operator new for a class to control allocations of objects of that class. 您可以为一个类重载operator new ，以控制该类对象的分配。 Using this, you can at least keep objects of that class together in memory. 使用此功能，您至少可以将该类的对象一起保存在内存中。 That doesn't guarantee that the items in a particular vector will be contiguous, but could improve locality enough to make a noticeable improvement in speed. 这不能保证特定向量中的项目是连续的，但是可以改善局部性，从而显着提高速度。

Also note that the vector allocates its data via an Allocator object (which defaults to std::allocator<T> , which, in turn, uses new ). 还要注意，向量通过Allocator对象分配数据（默认为std::allocator<T> ，而后者使用new ）。 Although the interface is kind of a mess so it's harder than you'd generally like, you can define an allocator to act differently if you wish. 尽管接口有点混乱，所以它比您通常想要的难，但您可以定义一个分配器，以根据需要进行不同的操作。 This won't generally have much effect on a single vector, but if (for example) you have a number of vectors (each of fixed size) and want them to use memory next to each other, you could do that via the allocator object. 通常，这对单个向量不会有太大影响，但是，例如，如果您有多个向量（每个向量都是固定大小），并且希望它们彼此相邻使用内存，则可以通过分配器对象来实现。

If I store my objects in the vector, and the objects will be next to each other in memory, or I allocate my object is the heap 如果我将对象存储在矢量中，并且这些对象在内存中彼此相邻，或者我分配的对象是堆

Regardless of using std::vector<A> or std::vector<A *> , the inner buffer of the vector will be allocated in the heap. 无论使用std::vector<A>还是std::vector<A *> ， std::vector<A *>的内部缓冲区都将在堆中分配。 You could, though, use an effecient memory pool to manage allocations and deletions, but you're still going to work with data on the heap. 但是，您可以使用高效的内存池来管理分配和删除，但是您仍将使用堆上的数据。

Is it bad compared with the case above in the aspect of performance? 与上述情况相比，在性能方面是否不好？

In the case of using std::vector<A *> without an specialized memory menagement, you may be lucky as to make the allocations and always get data nicely aligned in memory, but it is generally better to have the contiguous allocations performed by std::vector<A> . 如果使用std::vector<A *>而不进行专门的内存管理，则可能很幸运，可以进行分配并始终在内存中很好地对齐数据，但是通常最好由std::vector<A>执行连续分配std::vector<A> 。 In the former case, it may take longer to have to reallocate the entire vector (since pointers are usually smaller than regular structs), but it will suffer from locality (considering memory accesses). 在前一种情况下，必须重新分配整个向量可能会花费更长的时间（因为指针通常小于常规结构），但会受到局部性的影响（考虑到内存访问）。

When it brings a chunk of pointers is it easy for CPU to obtain objects by pointer? 当它带来大量的指针时，CPU是否容易通过指针获取对象？

No, it isn't. 不，不是。 CPU doesn't know they're pointers (everything CPU sees is just a bunch of bits, no semantics involved) until it fetches "dereferencing" instruction. 在获取“解引用”指令之前，CPU才知道它们是指针（CPU看到的只是一堆比特，不涉及任何语义）。

Or it should request from RAM the objects on which you call some functions and there will be cache misses and free CPU cycles? 还是应该从RAM请求在其上调用某些函数的对象，并且会出现高速缓存未命中和空闲的CPU周期？

That's right. 那就对了。 CPU will try to load data corresponding to a cached pointer but it's likely that this data is located somewhere far away from recently accessed memory, so it'd be a cache miss. CPU会尝试加载与缓存的指针相对应的数据，但是该数据可能位于距离最近访问的内存较远的某个地方，因此可能是缓存未命中。

Is it bad compared with the case above in the aspect of performance? 与上述情况相比，在性能方面是否不好？

If the only thing you care about is accessing elements, then yes, it's bad. 如果您唯一关心的是访问元素，那么是的，这很不好。 Yet in some cases vector of pointers is preferable. 但是在某些情况下，最好使用指针向量。 Namely, if your objects don't support moving (C++11 isn't mainstream yet) then vector copying becomes more expensive. 也就是说，如果您的对象不支持移动（C ++ 11尚未成为主流），则矢量复制会变得更加昂贵。 Even if don't copy your vector it may be the case when you don't know in advance number of stored elements, so you can't call reverse(n) beforehand. 即使不复制向量，也可能是您事先不知道存储的元素数的情况，因此您无法事先调用reverse(n) 。 Then all your objects will be copied when vector will exhaust its capacity and will be forced to resize. 然后，当vector将耗尽其容量并被迫调整大小时，将复制所有对象。

But in the end it depends on concrete type. 但最终它取决于具体类型。 If your objects is small (tiny structs, ints or floats) then it's obviously better to work with then by copying because of overhead of pointers would be too big. 如果您的对象很小（微小的结构，整数或浮点数），那么使用指针进行复制显然更好，因为指针的开销太大。