c++11 register cache thread safety

Question

in volatile: The Multithreaded Programmer's Best Friend , Andrei Alexandrescu gives this example:

class Gadget
{
public:
    void Wait()
    {
        while (!flag_)
        {
            Sleep(1000); // sleeps for 1000 milliseconds
        }
    }
    void Wakeup()
    {
        flag_ = true;
    }
    ...
private:
    bool flag_;
};

he states,

... the compiler concludes that it can cache flag_ in a register ... it harms correctness: after you call Wait for some Gadget object, although another thread calls Wakeup, Wait will loop forever. This is because the change of flag_ will not be reflected in the register that caches flag_.

then he offers a solution:

If you use the volatile modifier on a variable, the compiler won't cache that variable in registers — each access will hit the actual memory location of that variable.

now, other people mentioned on stackoverflow and elsewhere that volatile keyword doesn't really offer any thread-safety guarantees, and i should use std::atomic or mutex synchronization instead, which i do agree.

however, going the std::atomic route for example, which internally uses memory fences read_acquire and write_release ( Acquire and Release Semantics ), i don't see how it actually fixes the register-cache problem in particular.

in case of x86 for example, every load on x86/64 already implies acquire semantics and every store implies release semantics such that compiled code under x86 doesn't emit any actual memory barriers at all. ( The Purpose of memory_order_consume in C++11 )

g = Guard.load(memory_order_acquire);
if (g != 0)
    p = Payload;

On Intel x86-64, the Clang compiler generates compact machine code for this example – one machine instruction per line of C++ source code. This family of processors features a strong memory model, so the compiler doesn't need to emit special memory barrier instructions to implement the read-acquire.

so.... just assuming x86 arch for now, how does std::atomic solve the cache in registry problem? w/ no memory barrier instructions for read-acquire in compiled code, it seems to be the same as the compiled code for just regular read.

Answer 1

Did you notice that there was no load from just a register in your code? There was an explicit memory load from _Guard . So it did in fact prevent caching in a register.

Now how it does this is up to the specific platform's implementation of std::atomic , but it must do this.

And, by the way, Alexandrescu's reasoning is completely wrong for modern platforms. While it's true that volatile prevents the compiler from caching in a register, it doesn't prevent similar caching being done by the CPU or by hardware. On some platforms, it might happen to be adequate, but there is absolutely no reason to write gratuitously non-portable code that might break on a future CPU, compiler, library, or platform when a fully-portable alternative is readily available.

Answer 2

volatile is not necessary for any "sane" implementation when the Gadget example is changed to use std::atomic<bool> . The reason for this is not that the standard forbids the use of registers, instead (§29.3/13 in n3690):

Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.

Of course, what constitutes "reasonable" is open to interpretation, and it's "should", not "shall", so an implementation might ignore the requirement without violating the letter of the standard. Typical implementations do not cache the results of atomic loads, nor (much) delay issuing an atomic store to the CPU, and thus leave the decision largely to the hardware. If you would like to enforce this behavior, you should use volatile std::atomic<bool> instead. In both cases, however, if another thread sets the flag, the Wait() should be finite, but if your compiler and/or CPU are so willing, can still take much longer than you would like.

Also note that a memory fence does not guarantee that a store becomes visible to another thread immediately nor any sooner than it otherwise would. So even if the compiler added fence instructions to Gadget 's methods, they wouldn't help at all. Fences are used to guarantee consistency, not to increase performance.

c++11 register cache thread safety

Question

2 answers

solution1
5 ACCPTED 2015-09-10 08:14:44

solution2
1 2015-09-10 14:42:14

c++11 register cache thread safety

Question

2 answers

solution1 5 ACCPTED 2015-09-10 08:14:44

solution2 1 2015-09-10 14:42:14

solution1
5 ACCPTED 2015-09-10 08:14:44

solution2
1 2015-09-10 14:42:14