Simulate thread local variables

Question

I want to simulate thread local variables for non static members, something like this :

template< typename T, unsigned int tNumThread >
class ThreadLocal
{
private:

protected:
    T mData[tNumThread];

    unsigned int _getThreadIndex()
    {
        return ...; // i have a threadpool and each thread has an index from 0 to n
    }

public:
    ThreadLocal() {};
    ~ThreadLocal() {};

    T& operator ->()
    {
        return mData[_getThreadIndex()];
    }
    ...
};

But the problem is that number of thread will be determined at runtime and i must allocate mData from heap.

I want to know is there any way to don't use allocation from heap and use regular array like above?

Answer 1

Each thread has it's own stack, and remember stack frames are popped (or can be thought of as popping) when a function returns.

This is why we have "stackless python", because 1 stack (what python needs) and multiple threads do not play nice (see global interpreter lock)

You could put it in main, that will last, but remember C(++) wants to know all the sizes at COMPILE TIME, so if the thread count changes (isn't fixed AT COMPILE TIME) there's no way to know this.

What you really want is something not in main, but this wouldn't be a template (in the number) because that number cannot be known at compile time.

There is a stack allocate function (like malloc) that GCC provides, but I cannot find it, it is good to avoid using it though because then optimisations actually work.

Equally don't underestimate your CPUs read-ahead abilities and GCCs optimisation, putting the array on the heap isn't bad.

Interesting read with good pictures but unfortunately only distantly related to the topic: http://www.nongnu.org/avr-libc/user-manual/malloc.html

Answer 2

I suggest:

std::unordered_map<std::thread::id, T, stackalloc> myTLS;

Either you gobally lock around all accesses, or you prepare population in advance and access it read only later.

You can use it in combination with a stack allocator.

typedef short_alloc<pair<const thread::id, T>, maxthrds> stackalloc;

https://howardhinnant.github.io/stack_alloc.html

If you want another solution, here you can do:

struct Padder
{
    T t;
    char space_[128 - sizeof(T)];  // if sizeof(T) >= 128 just don't include the padding.
};

std::array<Padder, maxthreads> myTLS;

For MESI bottleneck-free access.
You'll have to care about tracking your threads with their own index in this array, with this method.

Simulate thread local variables

Question

2 answers

solution1
1 2013-10-24 11:05:38

solution2
0 2016-06-15 03:25:04

Simulate thread local variables

Question

2 answers

solution1 1 2013-10-24 11:05:38

solution2 0 2016-06-15 03:25:04

solution1
1 2013-10-24 11:05:38

solution2
0 2016-06-15 03:25:04