简体   繁体   English

位置和对对象的共享访问

[英]Locality and shared access to objects

Profiling my code, i see a lot of cache misses and would like to know whether there is a way to improve the situation. 对我的代码进行性能分析,我发现很多缓存未命中,并且想知道是否有一种方法可以改善这种情况。 Optimization is not really needed, I'm more curious about whether there exist general approaches to this problem (this is a follow up question). 确实不需要优化,我更好奇是否存在解决此问题的通用方法(这是一个后续问题)。

// class to compute stuff
class A {
    double compute();
    ...
    // depends on other objects
    std::vector<A*> dependencies;
}

I have a container class that stores pointers to all created objects of class A . 我有一个容器类,用于存储指向所有创建的类A对象的指针。 I do not store copies as I want to have shared access. 我不存储副本,因为我想拥有共享访问权限。 Before I was using shared_ptr , but as single A s are meaningless without the container, raw pointers are fine. 在我使用shared_ptr之前,但是由于单个A没有容器是没有意义的,因此原始指针就可以了。

class Container {
    ...
    void compute_all();
    std::vector<A*> objects;
    ...
}

The vector objects is insertion sorted in a way that the full evaluation can be done by simply iterating and calling A.compute() , all dependencies in A are resolved. 向量objects的插入排序方式为,可以通过简单地迭代和调用A.compute()来完成完整评估,从而解决A中的所有依赖项。

With a_i objects of class A , the evaluation might look like this: 对于类A a_i对象,评估可能看起来像这样:

a_1 => a_2 => a_3 --> a_2 --> a_1 => a_4 => ....

where => denotes iteration in Container and --> iteration over A::dependencies 其中=>表示Container迭代,而->表示A::dependencies迭代

Moreover, the Container class is created only once and compute_all() is called many times, so rearranging the whole structure after creation is an option and wouldn't harm efficiency much. 此外,Container类仅创建一次,而多次调用compute_all(),因此在创建后重新排列整个结构是一种选择 ,不会对效率造成太大影响。

Now to the observations/questions: 现在到观察/问题:

  1. Obviously, iterating over Container::objects is cache efficient, but accessing the pointees is definitely not. 显然,在Container::objects迭代可以提高缓存效率,但是访问pointee绝对不是。

  2. Moreover, as each object of type A has to iterate over A::dependencies , which again can produces cache misses. 而且,由于类型A每个对象都必须遍历A::dependencies ,这又会产生高速缓存未命中。

Would it help to create a separate vector<A*> from all needed object in evaluation order such that dependencies in A are inserted as copies? 以评估顺序从所有需要的对象中创建一个单独的vector<A*> <A *>是否有帮助,以便将A中的依赖项作为副本插入?

Something like this: 像这样:

a_1 => a_2 => a_3 => a_2_c => a_1_c => a_4 -> ....

where a_i_c are copies from a_i. 其中a_i_c是a_i的副本。

Thanks for your help and sorry if this question is confusing, but I find it rather difficult to extrapolate from simple examples to large applications. 感谢您的帮助,如果这个问题令人困惑,也请谅解。但是,我发现很难从简单的示例推断到大型应用程序。

Unfortunately, I'm not sure if I'm understanding your question correctly, but I'll try to answer. 不幸的是,我不确定我是否能正确理解您的问题,但我会尽力回答。

Cache misses are caused by the processor requiring data that is scattered all over memory. 高速缓存未命中是由处理器需要分散在整个内存中的数据引起的。

One very common way of increasing cache hits is just organizing your data so that everything that is accessed sequentially is in the same region of memory. 增加缓存命中率的一种非常常见的方法是组织数据,以便顺序访问的所有内容都在同一内存区域中。 Judging by your explanation, I think this is most likely your problem; 从您的解释来看,我认为这很可能是您的问题; your A objects are scattered all over the place. 您的A对象散落在各处。

If you're just calling regular new every single time you need to allocate an A , you'll probably end up with all of your A objects being scattered. 如果您每次需要分配A时仅调用常规的new ,那么最终可能会分散所有A对象。

You can create a custom allocator for objects that will be creating many times and accessed sequentially. 您可以为将创建多次并按顺序访问的对象创建自定义分配器。 This custom allocator could allocate a large number of objects and hand them out as requested. 该自定义分配器可以分配大量对象,并根据请求将它们分发出去。 This may be similar to what you meant by reordering your data. 这可能类似于您对数据重新排序的意思。

It can take a bit of work to implement this, however, because you have to consider cases such as what happens when it runs out of objects, how it knows which objects have been handed out, and so on. 但是,要实现此目标可能需要花费一些工作,因为您必须考虑各种情况,例如当对象用尽时会发生什么情况,如何知道已经分发了哪些对象等等。

// This example is very simple. Instead of using new to create an Object,
// the code can just call Allocate() and use the pointer returned.
// This ensures that all Object instances reside in the same region of memory.
struct CustomAllocator {
    CustomAllocator() : nextObject(cache) { }

    Object* Allocate() {
        return nextObject++;
    }

    Object* nextObject;
    Object cache[1024];
}

Another method involves caching operations that work on sequential data, but aren't performed sequentially. 另一种方法涉及对顺序数据起作用但不顺序执行的缓存操作。 I think this is what you meant by having a separate vector. 我认为这就是您拥有单独的向量的意思。

However, it's important to understand that your CPU doesn't just keep one section of memory in cache at a time. 但是,重要的是要了解您的CPU不会一次只在缓存中保留一部分内存。 It keeps multiple sections of memory cached. 它保持内存的多个部分被缓存。

If you're jumping back and forth between operations on data in one section and operations on data in another section, this most likely will not cause many cache hits; 如果您在一个部分中的数据操作与另一部分中的数据操作之间来回切换,则很可能不会造成很多缓存命中; your CPU can and should keep both sections cached at the same time. 您的CPU可以并且应该同时缓存两个部分。

If you're jumping between operations on 50 different sets of data, you'll probably encounter many cache misses. 如果您要在对50种不同的数据集进行操作之间跳转,则可能会遇到许多缓存未命中的情况。 In this scenario, caching operations would be beneficial. 在这种情况下,缓存操作将是有益的。

In your case, I don't think caching operations will give you much benefit. 就您而言,我认为缓存操作不会给您带来太多好处。 Ensuring that all of your A objects reside in the same section of memory, however, probably will. 但是,确保所有A对象都驻留在内存的同一部分中。

Another thing to consider is threading, but this can get pretty complicated. 要考虑的另一件事是线程,但这可能会变得非常复杂。 If your thread is doing a lot of context switches, you may encounter a lot of cache misses. 如果线程正在执行大量上下文切换,则可能会遇到很多缓存未命中的情况。

+1 for profiling first :) +1首先进行概要分析:)

While using a cusomt allocator can be the correct solution, I'd certainly recommend two things first: 虽然使用cusomt分配器可能是正确的解决方案,但我当然会首先建议两件事:

  • keep a reference/pointer to the entire vector of A instead of a vector of A*: 保留对A的整个向量而不是A *的向量的引用/指针:

.

class Container {
    ...
    void compute_all();
    std::vector<A>* objects;
    ...
}
  • Use a standard library with custom allocators (I think boost has some good ones, EASTL is centered around the very concept) 使用带有自定义分配器的标准库(我认为boost有一些不错的分配器,EASTL围绕这一概念)

$0.02 $ 0.02

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM