How to achieve cache coherency with an abstract class pointer vector in C++?

Question

I'm making a little game in C++. I found answers on StackExchange sites about cache coherency, and I would like to use it in my game, but I'm using child classes of an abstract class, Entity.

I'm storing all entities in a std::vector so that I can access virtual functions in loops. Entity::update() is a virtual function of Entity overridden by subclasses like PlayerEntity.

In Game.hpp - Private Member Variables:

std::vector<Entity*> mEntities;
PlayerEntity* mPlayer;

In Game.cpp - Constructor:

mPlayer = new PlayerEntity();
mEntities.push_back(mPlayer);

Here's what my update function (in the main loop) looks like:

void Game::update() {
  for (Entity* entity : mEntities) {
    entity->update(mTimeStep, mGeneralClock.getElapsedTime().asMilliseconds());
  }
}

My question is: How do I make my entities objects be next to each other in memory, and thus achieve cache coherency? I tried to simply make the vector of pointers a vector of objects and make the appropriate changes, but then I couldn't use polymorphism for obvious reasons. Side question: what determines where an object in allocated in memory? Am I doing the whole thing wrong? If so, how should I store my entities?

Note: I'm sorry if my english is bad, I'm not a native speaker.

Answer 1

Obviously, first measure which parts are even worth optimizing. Not all games are created equal, and not all code within a game is created equal. There is no use in completely restructuring the script that triggers the end boss's death animation to make it use 1 cache line instead of 2. That said...

If you are aiming for optimizing for cache, forget about inheritance and virtual functions. Or at least be critical of them. As you note, creating a contiguous array of polymorphic objects is somewhere between hard & error-prone and completely infeasible (depending on whether subclasses have different sizes).

You can attempt to create a pool, to have nearby entities (in the entities vector) more likely to be close to each other (in memory), but frankly I doubt you'll do much better than a state of the art general-purpose allocator, especially when the entities' size and lifetime varies significantly. A pool would only help if entities adjacent in the vector are allocated back-to-back. But in that case, any standard allocator gives the same locality advantages. It's not like tcmalloc and friends select a random cache line to allocate from just to annoy you.

You might be able squeeze a bit of memory out of knowing your object types, but this is purely hypothetical and would have to be proven first to justify the effort of implementing it. Also note that a run of the mill pool either assumes that all objects are the same size, or that you never deallocate individual objects. Allowing both puts you halfway towards a general-purpose allocator, which you're bound to do worse.

You can segregate objects based on their types. That is, instead of a single vector with polymorphic Entity s with virtual functions, have N vectors: vector<Bullet> , vector<Monster> , vector<Loot> , and so on. This is less insane than it sounds for threereasons:

Often, you can pull out the entire business of managing one such vector into a dedicated system. So in the end you might even have a vector<System *> where each System has a vector for one kind of thing, and updates all those things in a single virtual call (delegating to many statically-dispatched calls).
You don't need to represent everything ever in this abstraction. Not every little integer needs to be wrapped in its own type of entity.
If you go further down this route and take hints from entity component systems, you also gain an alternative to inheritance for code reuse ( class Monster : Entity {}; class Skeleton : Monster {}; ) that plays nicer with the hard-earned cache friendliness.

Answer 2

It is not easy because polymorphism doesn't work well with cache coherency.

I think the best you can overload the base class new operator to allocate memory from a pool. But to do this, you need to know the size of all derived classes and after some allocating/deallocating you can have memory fragmentation which will lower the gain.

Answer 3

看一下Cachegrind ，它是一个模拟程序如何与计算机的缓存层次结构交互的工具。

How to achieve cache coherency with an abstract class pointer vector in C++?

Question

3 answers

solution1
4 ACCPTED

solution2
1 2014-06-11 13:25:01

solution3
0 2014-06-11 13:17:19

How to achieve cache coherency with an abstract class pointer vector in C++?

Question

3 answers

solution1 4 ACCPTED

solution2 1 2014-06-11 13:25:01

solution3 0 2014-06-11 13:17:19

solution1
4 ACCPTED

solution2
1 2014-06-11 13:25:01

solution3
0 2014-06-11 13:17:19