How do I safely write a test and test-and-set (TATAS) spinlock with C++11 for x86(-64)?

Question

I'm currently working on a Spinlock class and trying to make it as reasonable as possible, mostly based on advice here: https://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-for-multi-core-intel-em64t-and-ia32-architectures

The work-in-progress looks like this:

class Spinlock
{
public:
    Spinlock() : m_lock(false) {}

    void lock()
    {
        // heavy test here for the usual uncontested case
        bool exp = false;
        if (!m_lock.compare_exchange_weak(exp, true, std::memory_order_acquire))
        {
            int failCount = 0;
            for (;;)
            {
                // processor spin loop hint
                _mm_pause();

                // spin on mov instead of lock instruction
                if (!m_lock.load(std::memory_order_relaxed))
                {
                    // heavy test now that we suspect success
                    exp = false;
                    if (m_lock.compare_exchange_weak(exp, true, std::memory_order_acquire))
                    {
                        return;
                    }
                }

                // backoff (potentially make exponential later)
                if (++failCount == SOME_VALUE)
                {
                    // Yield somehow.
                    failCount = 0;
                }
            }
        }
    }

    void unlock()
    {
        m_lock.store(false, std::memory_order_release);
    }

    std::atomic_bool m_lock;
};

However, it seems like having that relaxed read in there can theoretically allow generated code to do unexpected things like create deadlocks: http://joeduffyblog.com/2009/02/23/the-magical-dueling-deadlocking-spin-locks/

This code shouldn't deadlock in the same way as the linked example because the outer acquire should keep the relaxed load from drifting behind, but I don't really have a handle on all the code transformations that could exist. What memory orders and/or fences do I need to keep this code safe without losing performance? Is it possible for a backoff implementation to occur significantly more or less frequently (> a few loops) than intended because the surrounding memory orders are too relaxed?

On a related note, why are spinlock examples around the web using acquire/release memory order for spinlocks instead of sequentially consistent? I found a comment saying that allowing a spinlock release to cross a later spinlock acquire could lead to problems: http://preshing.com/20120913/acquire-and-release-semantics/#IDComment721195810

Answer 1

This code shouldn't deadlock in the same way as the linked example because the outer acquire should keep the relaxed load from drifting behind, but I don't really have a handle on all the code transformations that could exist.

Acquire operation guarantees that subsequent reads will not be reordered prior to the acquire operation by both compiler and the CPU.

What memory orders and/or fences do I need to keep this code safe without losing performance?

You do not need any extra synchronization here, your code does the right thing.

why are spinlock examples around the web using acquire/release memory order for spinlocks instead of sequentially consistent?

Because acquire/release semantics are enough to implement a mutex. On some architecture sequential consistency operations are more expensive than acquire/release.

I cannot recommend enough watching atomic<> Weapons: The C++ Memory Model and Modern Hardware , it covers this subject in great detail.

How do I safely write a test and test-and-set (TATAS) spinlock with C++11 for x86(-64)?

Question

1 answers

solution1
1 2015-03-23 08:43:03

How do I safely write a test and test-and-set (TATAS) spinlock with C++11 for x86(-64)?

Question

1 answers

solution1 1 2015-03-23 08:43:03

solution1
1 2015-03-23 08:43:03