C++11 memory pool design pattern?

Question

I have a program that contains a processing phase that needs to use a bunch of different object instances (all allocated on the heap) from a tree of polymorphic types, all eventually derived from a common base class.

As the instances may cyclically reference each other, and do not have a clear owner, I want allocated them with new , handle them with raw pointers, and leave them in memory for the phase (even if they become unreferenced), and then after the phase of the program that uses these instances, I want to delete them all at once.

How I thought to structure it is as follows:

struct B; // common base class

vector<unique_ptr<B>> memory_pool;

struct B
{
    B() { memory_pool.emplace_back(this); }

    virtual ~B() {}
};

struct D : B { ... }

int main()
{
    ...

    // phase begins
    D* p = new D(...);

    ...

    // phase ends
    memory_pool.clear();
    // all B instances are deleted, and pointers invalidated

    ...
}

Apart from being careful that all B instances are allocated with new, and that noone uses any pointers to them after the memory pool is cleared, are there problems with this implementation?

Specifically I am concerned about the fact that the this pointer is used to construct a std::unique_ptr in the base class constructor, before the derived class constructor has completed. Does this result in undefined behaviour? If so is there a workaround?

Answer 1

In case you haven't already, familiarize yourself with Boost.Pool . From the Boost documentation:

What is Pool?

Pool allocation is a memory allocation scheme that is very fast, but limited in its usage. For more information on pool allocation (also called simple segregated storage , see concepts concepts and Simple Segregated Storage .

Why should I use Pool?

Using Pools gives you more control over how memory is used in your program. For example, you could have a situation where you want to allocate a bunch of small objects at one point, and then reach a point in your program where none of them are needed any more. Using pool interfaces, you can choose to run their destructors or just drop them off into oblivion; the pool interface will guarantee that there are no system memory leaks.

When should I use Pool?

Pools are generally used when there is a lot of allocation and deallocation of small objects. Another common usage is the situation above, where many objects may be dropped out of memory.

In general, use Pools when you need a more efficient way to do unusual memory control.

Which pool allocator should I use?

pool_allocator is a more general-purpose solution, geared towards efficiently servicing requests for any number of contiguous chunks.

fast_pool_allocator is also a general-purpose solution but is geared towards efficiently servicing requests for one chunk at a time; it will work for contiguous chunks, but not as well as pool_allocator .

If you are seriously concerned about performance, use fast_pool_allocator when dealing with containers such as std::list , and use pool_allocator when dealing with containers such as std::vector .

Memory management is tricky business (threading, caching, alignment, fragmentation, etc. etc.) For serious production code, well-designed and carefully optimized libraries are the way to go, unless your profiler demonstrates a bottleneck.

Answer 2

Your idea is great and millions of applications are already using it. This pattern is most famously known as «autorelease pool». It forms a base for ”smart” memory management in Cocoa and Cocoa Touch Objective-C frameworks. Despite the fact that C++ provides hell of a lot of other alternatives, I still think this idea got a lot of upside. But there are few things where I think your implementation as it stands may fall short.

The first problem that I can think of is thread safety. For example, what happens when objects of the same base are created from different threads? A solution might be to protect the pool access with mutually exclusive locks. Though I think a better way to do this is to make that pool a thread-specific object.

The second problem is invoking an undefined behavior in case where derived class's constructor throws an exception. You see, if that happens, the derived object won't be constructed, but your B 's constructor would have already pushed a pointer to this to the vector. Later on, when the vector is cleared, it would try to call a destructor through a virtual table of the object that either doesn't exist or is in fact a different object (because new could reuse that address).

The third thing I don't like is that you have only one global pool, even if it is thread-specific, that just doesn't allow for a more fine grained control over the scope of allocated objects.

Taking the above into account, I would do a couple of improvements:

Have a stack of pools for more fine-grained scope control.
Make that pool stack a thread-specific object.
In case of failures (like exception in derived class constructor), make sure the pool doesn't hold a dangling pointer.

Here is my literally 5 minutes solution, don't judge for quick and dirty:

#include <new>
#include <set>
#include <stack>
#include <cassert>
#include <memory>
#include <stdexcept>
#include <iostream>

#define thread_local __thread // Sorry, my compiler doesn't C++11 thread locals

struct AutoReleaseObject {
    AutoReleaseObject();
    virtual ~AutoReleaseObject();
};

class AutoReleasePool final {
  public:
    AutoReleasePool() {
        stack_.emplace(this);
    }

    ~AutoReleasePool() noexcept {
        std::set<AutoReleaseObject *> obj;
        obj.swap(objects_);
        for (auto *p : obj) {
            delete p;
        }
        stack_.pop();
    }

    static AutoReleasePool &instance() {
        assert(!stack_.empty());
        return *stack_.top();
    }

    void add(AutoReleaseObject *obj) {
        objects_.insert(obj);
    }

    void del(AutoReleaseObject *obj) {
        objects_.erase(obj);
    }

    AutoReleasePool(const AutoReleasePool &) = delete;
    AutoReleasePool &operator = (const AutoReleasePool &) = delete;

  private:
    // Hopefully, making this private won't allow users to create pool
    // not on stack that easily... But it won't make it impossible of course.
    void *operator new(size_t size) {
        return ::operator new(size);
    }

    std::set<AutoReleaseObject *> objects_;

    struct PrivateTraits {};

    AutoReleasePool(const PrivateTraits &) {
    }

    struct Stack final : std::stack<AutoReleasePool *> {
        Stack() {
            std::unique_ptr<AutoReleasePool> pool
                (new AutoReleasePool(PrivateTraits()));
            push(pool.get());
            pool.release();
        }

        ~Stack() {
            assert(!stack_.empty());
            delete stack_.top();
        }
    };

    static thread_local Stack stack_;
};

thread_local AutoReleasePool::Stack AutoReleasePool::stack_;

AutoReleaseObject::AutoReleaseObject()
{
    AutoReleasePool::instance().add(this);
}

AutoReleaseObject::~AutoReleaseObject()
{
    AutoReleasePool::instance().del(this);
}

// Some usage example...

struct MyObj : AutoReleaseObject {
    MyObj() {
        std::cout << "MyObj::MyObj(" << this << ")" << std::endl;
    }

    ~MyObj() override {
        std::cout << "MyObj::~MyObj(" << this << ")" << std::endl;
    }

    void bar() {
        std::cout << "MyObj::bar(" << this << ")" << std::endl;
    }
};

struct MyObjBad final : AutoReleaseObject {
    MyObjBad() {
        throw std::runtime_error("oops!");
    }

    ~MyObjBad() override {
    }
};

void bar()
{
    AutoReleasePool local_scope;
    for (int i = 0; i < 3; ++i) {
        auto o = new MyObj();
        o->bar();
    }
}

void foo()
{
    for (int i = 0; i < 2; ++i) {
        auto o = new MyObj();
        bar();
        o->bar();
    }
}

int main()
{
    std::cout << "main start..." << std::endl;
    foo();
    std::cout << "main end..." << std::endl;
}

Answer 3

Hmm, I needed almost exactly the same thing recently (memory pool for one phase of a program that gets cleared all at once), except that I had the additional design constraint that all my objects would be fairly small.

I came up with the following "small-object memory pool" -- perhaps it will be of use to you:

#pragma once

#include "defs.h"
#include <cstdint>      // uintptr_t
#include <cstdlib>      // std::malloc, std::size_t
#include <type_traits>  // std::alignment_of
#include <utility>      // std::forward
#include <algorithm>    // std::max
#include <cassert>      // assert


// Small-object allocator that uses a memory pool.
// Objects constructed in this arena *must not* have delete called on them.
// Allows all memory in the arena to be freed at once (destructors will
// be called).
// Usage:
//     SmallObjectArena arena;
//     Foo* foo = arena::create<Foo>();
//     arena.free();        // Calls ~Foo
class SmallObjectArena
{
private:
    typedef void (*Dtor)(void*);

    struct Record
    {
        Dtor dtor;
        short endOfPrevRecordOffset;    // Bytes between end of previous record and beginning of this one
        short objectOffset;             // From the end of the previous record
    };

    struct Block
    {
        size_t size;
        char* rawBlock;
        Block* prevBlock;
        char* startOfNextRecord;
    };

    template<typename T> static void DtorWrapper(void* obj) { static_cast<T*>(obj)->~T(); }

public:
    explicit SmallObjectArena(std::size_t initialPoolSize = 8192)
        : currentBlock(nullptr)
    {
        assert(initialPoolSize >= sizeof(Block) + std::alignment_of<Block>::value);
        assert(initialPoolSize >= 128);

        createNewBlock(initialPoolSize);
    }

    ~SmallObjectArena()
    {
        this->free();
        std::free(currentBlock->rawBlock);
    }

    template<typename T>
    inline T* create()
    {
        return new (alloc<T>()) T();
    }

    template<typename T, typename A1>
    inline T* create(A1&& a1)
    {
        return new (alloc<T>()) T(std::forward<A1>(a1));
    }

    template<typename T, typename A1, typename A2>
    inline T* create(A1&& a1, A2&& a2)
    {
        return new (alloc<T>()) T(std::forward<A1>(a1), std::forward<A2>(a2));
    }

    template<typename T, typename A1, typename A2, typename A3>
    inline T* create(A1&& a1, A2&& a2, A3&& a3)
    {
        return new (alloc<T>()) T(std::forward<A1>(a1), std::forward<A2>(a2), std::forward<A3>(a3));
    }

    // Calls the destructors of all currently allocated objects
    // then frees all allocated memory. Destructors are called in
    // the reverse order that the objects were constructed in.
    void free()
    {
        // Destroy all objects in arena, and free all blocks except
        // for the initial block.
        do {
            char* endOfRecord = currentBlock->startOfNextRecord;
            while (endOfRecord != reinterpret_cast<char*>(currentBlock) + sizeof(Block)) {
                auto startOfRecord = endOfRecord - sizeof(Record);
                auto record = reinterpret_cast<Record*>(startOfRecord);
                endOfRecord = startOfRecord - record->endOfPrevRecordOffset;
                record->dtor(endOfRecord + record->objectOffset);
            }

            if (currentBlock->prevBlock != nullptr) {
                auto memToFree = currentBlock->rawBlock;
                currentBlock = currentBlock->prevBlock;
                std::free(memToFree);
            }
        } while (currentBlock->prevBlock != nullptr);
        currentBlock->startOfNextRecord = reinterpret_cast<char*>(currentBlock) + sizeof(Block);
    }

private:
    template<typename T>
    static inline char* alignFor(char* ptr)
    {
        const size_t alignment = std::alignment_of<T>::value;
        return ptr + (alignment - (reinterpret_cast<uintptr_t>(ptr) % alignment)) % alignment;
    }

    template<typename T>
    T* alloc()
    {
        char* objectLocation = alignFor<T>(currentBlock->startOfNextRecord);
        char* nextRecordStart = alignFor<Record>(objectLocation + sizeof(T));
        if (nextRecordStart + sizeof(Record) > currentBlock->rawBlock + currentBlock->size) {
            createNewBlock(2 * std::max(currentBlock->size, sizeof(T) + sizeof(Record) + sizeof(Block) + 128));
            objectLocation = alignFor<T>(currentBlock->startOfNextRecord);
            nextRecordStart = alignFor<Record>(objectLocation + sizeof(T));
        }
        auto record = reinterpret_cast<Record*>(nextRecordStart);
        record->dtor = &DtorWrapper<T>;
        assert(objectLocation - currentBlock->startOfNextRecord < 32768);
        record->objectOffset = static_cast<short>(objectLocation - currentBlock->startOfNextRecord);
        assert(nextRecordStart - currentBlock->startOfNextRecord < 32768);
        record->endOfPrevRecordOffset = static_cast<short>(nextRecordStart - currentBlock->startOfNextRecord);
        currentBlock->startOfNextRecord = nextRecordStart + sizeof(Record);

        return reinterpret_cast<T*>(objectLocation);
    }

    void createNewBlock(size_t newBlockSize)
    {
        auto raw = static_cast<char*>(std::malloc(newBlockSize));
        auto blockStart = alignFor<Block>(raw);
        auto newBlock = reinterpret_cast<Block*>(blockStart);
        newBlock->rawBlock = raw;
        newBlock->prevBlock = currentBlock;
        newBlock->startOfNextRecord = blockStart + sizeof(Block);
        newBlock->size = newBlockSize;
        currentBlock = newBlock;
    }

private:
    Block* currentBlock;
};

To answer your question, you're not invoking undefined behaviour since nobody is using the pointer until the object is fully constructed (the pointer value itself is safe to copy around until then). However, it's a rather intrusive method, as the object(s) themselves need to know about the memory pool. Additionally, if you're constructing a large number of small objects, it would likely be faster to use an actual pool of memory (like my pool does) instead of calling out to new for every object.

Whatever pool-like approach you use, be careful that the objects are never manually delete ed, because that would lead to a double free!

Answer 4

I still think this is an interesting question without a definitive reply, but please let me break it down into the different questions you are actually asking:

1.) Does inserting a pointer to a base class into a vector before initialisation of a subclass prevent or cause issues with retrieving inherited classes from that pointer. [slicing for example.]

Answer: No, so long as you are 100% sure of the relevant type that is being pointed to, this mechanism does not cause these issues however note the following points:

If the derived constructor fails, you are left with an issue later when you are likely to have a dangling pointer at least sitting in the vector, as that address space it [the derived class] thought it was getting would be freed to the operating environment on failure, but the vector still has the address as being of the base class type.

Note that a vector, although kind of useful, is not the best structure for this, and even if it was, there should be some inversion of control involved here to allow the vector object to control initialisation of your objects, so that you have awareness of success/failure.

These points lead to the implied 2nd question:

2.) Is this a good pattern for pooling?

Answer: Not really, for the reasons mentioned above, plus others (Pushing a vector past it's end point basically ends up with a malloc which is unnecessary and will impact performance.) Ideally you want to use a pooling library, or a template class, and even better, separate the allocation/de-allocation policy implementation away from the pool implementation, with a low level solution already being hinted at, which is to allocate adequate pool memory from pool initialisation, and then use this using pointers to void from within the pool address space (See Alex Zywicki's solution above.) Using this pattern, the pool destruction is safe as the pool which will be contiguous memory can be destroyed en masse without any dangling issues, or memory leaks through losing all references to an object (losing all reference to an object whose address is allocated through the pool by the storage manager leaves you with dirty chunk/s, but will not cause a memory leak as it is managed by the pool impl ementation.

In the early days of C/C++ (before mass proliferation of the STL), this was a well discussed pattern and many implementations and designs can be found out there in good literature: As an example:

Knuth (1973 The art of computer programming: Multiple volumes), and for a more complete list, with more on pooling, see:

http://www.ibm.com/developerworks/library/l-memory/

The 3rd implied question seems to be:

3) Is this a valid scenario to use pooling?

Answer: This is a localised design decision based on what you are comfortable with, but to be honest, your implementation (no controlling structure/aggregate, possibly cyclic sharing of sub sets of objects) suggests to me that you would be better off with a basic linked list of wrapper objects, each of which contains a pointer to your superclass, used only for addressing purposes. Your cyclical structures are built on top of this, and you simply amend/grow shrink the list as required to accommodate all of your first class objects as required, and when finished, you can then easily destroy them in effectively an O(1) operation from within the linked list.

Having said that, I would personally recommend that at this time (when you have a scenario where pooling does have a use and so you are in the right mind-set) to carry out the building of a storage management/pooling set of classes that are paramaterised/typeless now as it will hold you in good stead for the future.

Answer 5

This sounds what I have heard called a Linear Allocator. I will explain the basics of how I understand how it works.

Allocate a block of memory using ::operator new(size);
Have a void* that is your Pointer to the next free space in memory.
You will have an alloc(size_t size) function that will give you a pointer to the location in the block from step one for you to construct on to using Placement New
Placement new looks like... int* i = new(location)int(); where location is a void* to a block of memory you alloced from the allocator.
when you are done with all of your memory you will call a Flush() function that will dealloc the memory from the pool or at least wipe the data clean.

I programmed one of these recently and i will post my code here for you as well as do my best to explain.

    #include <iostream>
    class LinearAllocator:public ObjectBase
    {
    public:
        LinearAllocator();
        LinearAllocator(Pool* pool,size_t size);
        ~LinearAllocator();
        void* Alloc(Size_t size);
        void Flush();
    private:
        void** m_pBlock;
        void* m_pHeadFree;
        void* m_pEnd;
    };

don't worry about what i'm inheriting from. i have been using this allocator in conjunction with a memory pool. but basically instead of getting the memory from operator new i am getting memory from a memory pool. the internal workings are the same essentially.

Here is the implementation:

LinearAllocator::LinearAllocator():ObjectBase::ObjectBase()
{
    m_pBlock = nullptr;
    m_pHeadFree = nullptr;
    m_pEnd=nullptr;
}

LinearAllocator::LinearAllocator(Pool* pool,size_t size):ObjectBase::ObjectBase(pool)
{
    if (pool!=nullptr) {
        m_pBlock = ObjectBase::AllocFromPool(size);
        m_pHeadFree = * m_pBlock;
        m_pEnd = (void*)((unsigned char*)*m_pBlock+size);
    }
    else{
        m_pBlock = nullptr;
        m_pHeadFree = nullptr;
        m_pEnd=nullptr;
    }
}
LinearAllocator::~LinearAllocator()
{
    if (m_pBlock!=nullptr) {
        ObjectBase::FreeFromPool(m_pBlock);
    }
    m_pBlock = nullptr;
    m_pHeadFree = nullptr;
    m_pEnd=nullptr;
}
MemoryBlock* LinearAllocator::Alloc(size_t size)
{
    if (m_pBlock!=nullptr) {
        void* test = (void*)((unsigned char*)m_pEnd-size);
        if (m_pHeadFree<=test) {
            void* temp = m_pHeadFree;
            m_pHeadFree=(void*)((unsigned char*)m_pHeadFree+size);
            return temp;
        }else{
            return nullptr;
        }
    }else return nullptr;
}
void LinearAllocator::Flush()
{
    if (m_pBlock!=nullptr) {
        m_pHeadFree=m_pBlock;
        size_t size = (unsigned char*)m_pEnd-(unsigned char*)*m_pBlock;
        memset(*m_pBlock,0,size);
    }
}

This code is fully functional except for a few lines which will need to be changed because of my inheritance and use of the memory pool. but I bet you can figure out what needs to change and just let me know if you need a hand changing the code. This code has not been tested in any sort of professional manor and is not guaranteed to be thread safe or anything fancy like that. i just whipped it up and thought i could share it with you since you seemed to need help.

I also have a working implementation of a fully generic memory pool if you think it may help you. I can explain how it works if you need.

Once again if you need any help let me know. Good luck.

C++11 memory pool design pattern?

Question

5 answers

solution1
15 2013-05-04 20:51:14

solution2
14

solution3
4 2013-05-04 20:34:24

solution4
3

solution5
2 2014-01-25 09:51:01

C++11 memory pool design pattern?

Question

5 answers

solution1 15 2013-05-04 20:51:14

solution2 14

solution3 4 2013-05-04 20:34:24

solution4 3

solution5 2 2014-01-25 09:51:01

solution1
15 2013-05-04 20:51:14

solution2
14

solution3
4 2013-05-04 20:34:24

solution4
3

solution5
2 2014-01-25 09:51:01