简体   繁体   English

无锁堆栈 - 这是c ++ 11宽松原子的正确用法吗?可以证明吗?

[英]Lock-free stack - Is this a correct usage of c++11 relaxed atomics? Can it be proven?

I've written a container for a very simple piece of data that needs to be synchronized across threads. 我为一个非常简单的数据写了一个容器,需要跨线程同步。 I want the top performance. 我想要最好的表现。 I don't want to use locks. 我不想使用锁。

I want to use "relaxed" atomics. 我想用“放松”的原子。 Partly for that little bit of extra oomph, and partly to really understand them. 部分是为了一点额外的魅力,部分是为了真正理解它们。

I've been working on this a lot, and I'm at the point where this code passes all tests I throw at it. 我一直在研究这个问题,而且我正处于这个代码通过我抛出的所有测试的地步。 That's not quite "proof" though, and so I'm wondering if there's anything I'm missing, or any other ways I can test this? 但这并不是“证据”,所以我想知道是否有任何我遗漏的东西,或者我可以测试的其他任何方式?

Here's my premise: 这是我的前提:

  • It is only important that a Node be properly pushed and popped, and that the Stack can never be invalidated. 唯一重要的是正确地推送和弹出节点,并且堆栈永远不会失效。
  • I believe that the order of operations in memory is only important in one place: 我相信内存中的操作顺序只在一个地方很重要:
    • Between the compare_exchange operations themselves. 在compare_exchange操作之间。 This is guaranteed, even with relaxed atomics. 这是有保证的,即使是轻松的原子。
  • The "ABA" problem is solved by adding identification numbers to the pointers. 通过向指针添加标识号来解决“ABA”问题。 On 32 bit systems, this requires a double-word compare_exchange, and on 64 bit systems the unused 16 bits of the pointer are filled with id numbers. 在32位系统上,这需要双字compare_exchange,而在64位系统上,未使用的16位指针用id号填充。
  • Therefore: the stack will always be in a valid state. 因此:堆栈将始终处于有效状态。 (right?) (对?)

Here's what I'm thinking. 这就是我在想的。 "Normally", the way we reason about code that we're reading is to look at the order in which it's written. “通常”,我们对我们正在阅读的代码进行推理的方式是查看它所编写的顺序。 Memory can be read or written to "out of order", but not in a way that invalidates the correctness of the program. 内存可以被读取或写入“乱序”,但不能使程序的正确性无效。

That changes in a multi-threaded environment. 这在多线程环境中发生了变化。 That's what memory fences are for - so that we can still look at the code and be able to reason about how it's going to work. 这就是内存栅栏的用途 - 这样我们仍然可以查看代码并能够推断它是如何工作的。

So if everything can go all out-of-order here, what am I doing with relaxed atomics? 所以,如果一切都可以在这里完全失序,那么我在放松原子能做什么呢? Isn't that a bit too far? 这不是有点太远吗?

I don't think so, but that's why I'm here asking for help. 我不这么认为,但这就是我在这里寻求帮助的原因。

The compare_exchange operations themselves give a guarantee of sequential constancy with each other. compare_exchange操作本身可以保证彼此之间具有连续的恒定性。

The only other time there is read or write to an atomic is to get the head's initial value before a compare_exchange. 读取或写入原子的唯一另一个时间是在compare_exchange之前获取头部的初始值。 It is set as part of the initialization of a variable. 它被设置为变量初始化的一部分。 As far as I can tell, it would be irrelevant whether or not this operation brings back a "proper" value. 据我所知,这个操作是否带回了“适当的”值是无关紧要的。

Current code: 当前代码:

struct node
{
    node *n_;
#if PROCESSOR_BITS == 64
    inline constexpr node() : n_{ nullptr }                 { }
    inline constexpr node(node* n) : n_{ n }                { }
    inline void tag(const stack_tag_t t)                    { reinterpret_cast<stack_tag_t*>(this)[3] = t; }
    inline stack_tag_t read_tag()                           { return reinterpret_cast<stack_tag_t*>(this)[3]; }
    inline void clear_pointer()                             { tag(0); }
#elif PROCESSOR_BITS == 32
    stack_tag_t t_;
    inline constexpr node() : n_{ nullptr }, t_{ 0 }        { }
    inline constexpr node(node* n) : n_{ n }, t_{ 0 }       { }
    inline void tag(const stack_tag_t t)                    { t_ = t; }
    inline stack_tag_t read_tag()                           { return t_; }
    inline void clear_pointer()                             { }
#endif
    inline void set(node* n, const stack_tag_t t)           { n_ = n; tag(t); }
};

using std::memory_order_relaxed;
class stack
{
public:
    constexpr stack() : head_{}{}
    void push(node* n)
    {
        node next{n}, head{head_.load(memory_order_relaxed)};
        do
        {
            n->n_ = head.n_;
            next.tag(head.read_tag() + 1);
        } while (!head_.compare_exchange_weak(head, next, memory_order_relaxed, memory_order_relaxed));
    }

    bool pop(node*& n)
    {
        node clean, next, head{head_.load(memory_order_relaxed)};
        do
        {
            clean.set(head.n_, 0);

            if (!clean.n_)
                return false;

            next.set(clean.n_->n_, head.read_tag() + 1);
        } while (!head_.compare_exchange_weak(head, next, memory_order_relaxed, memory_order_relaxed));

        n = clean.n_;
        return true;
    }
protected:
    std::atomic<node> head_;
};

What's different about this question compared to others? 与其他人相比,这个问题有什么不同? Relaxed atomics. 放松的原子。 They make a big difference to the question. 他们对这个问题产生了很大的影响。

So, what do you think? 所以你怎么看? Is there anything I'm missing? 有什么我想念的吗?

push is broken, since you do not update node->_next after a compareAndSwap failure. push已损坏,因为在compareAndSwap失败后你不会更新node->_next It's possible that the node you originally stored with node->setNext has been popped from the top of stack by another thread when the next compareAndSwap attempt succeeds. 当下一次compareAndSwap尝试成功时,最初使用node->setNext存储的node->setNext已被另一个线程从堆栈顶部弹出。 As a result, some thread thinks it has popped a node from the stack but this thread has put it back in the stack. 其结果是,一些线程认为它已经从堆栈中弹出一个节点,但这个线程已经把它放回堆栈。 It should be: 它应该是:

void push(Node* node) noexcept
{
    Node* n = _head.next();
    do {
        node->setNext(n);
    } while (!_head.compareAndSwap(n, node));
}

Also, since next and setNext use memory_order_relaxed , there's no guarantee that _head_.next() here is returning the node most recently pushed. 此外,由于nextsetNext使用memory_order_relaxed ,因此无法保证_head_.next()此处返回最近推送的节点。 It's possible to leak nodes from the top of the stack. 可以从堆栈顶部泄漏节点。 The same problem obviously exists in pop as well: _head.next() may return a node that was previously but is no longer at the top of the stack. pop中也存在同样的问题: _head.next()可能会返回一个先前但不再位于堆栈顶部的节点。 If the returned value is nullptr , you may fail to pop when the stack is not actually empty. 如果返回的值为nullptr ,则当堆栈实际上不为空时,可能无法弹出。

pop can also have undefined behavior if two threads try to pop the last node from the stack at the same time. 如果两个线程同时尝试从堆栈中弹出最后一个节点,则pop也可能具有未定义的行为。 They both see the same value for _head.next() , one thread successfully completes pop. 它们都看到_head.next()的相同值,一个线程成功完成pop。 The other thread enters the while loop - since the observed node pointer is not nullptr - but the compareAndSwap loop soon updates it to nullptr since the stack is now empty. 另一个线程进入while循环 - 因为观察到的节点指针不是nullptr - 但是compareAndSwap循环很快将它更新为nullptr因为堆栈现在是空的。 On the next iteration of the loop, that nullptr is dererenced to get its _next pointer and much hilarity ensues. 在循环的下一次迭代中,该nullptr被_next以获得其_next指针并且随之而来的是非常欢闹。

pop is also clearly suffering from ABA. pop也明显患有ABA。 Two threads can see the same node at the top of the stack. 两个线程可以在堆栈顶部看到相同的节点。 Say one thread gets to the point of evaluating the _next pointer and then blocks. 假设一个线程到达评估_next指针然后阻塞的程度。 The other thread successfully pops the node, pushes 5 new nodes, and then pushes that original node again all before the other thread wakes. 另一个线程成功弹出节点,推送5个新节点,然后在另一个线程唤醒之前再次推送该原始节点。 That other thread's compareAndSwap will succeed - the top-of-stack node is the same - but store the old _next value into _head instead of the new one. 其他线程的compareAndSwap将成功 - 栈顶节点是相同的 - 但将旧的_next值存储到_head而不是新的。 The five nodes pushed by the other thread are all leaked. 另一个线程推送的五个节点都被泄露了。 This would be the case with memory_order_seq_cst as well. 这也是memory_order_seq_cst的情况。

Leaving to one side the difficulty of implementing the pop operation, I think memory_order_relaxed is inadequate. 让一方面难以实现pop操作,我认为memory_order_relaxed是不合适的。 Before pushing the node, one assumes that some value(s) will be written into to it, which will be read when the node is popped. 在推送节点之前,假设将向其写入一些值,当弹出节点时将读取该值。 You need some synchronization mechanism to ensure that the values have actually been written before they are read. 您需要一些同步机制来确保在读取值之前实际写入了值。 memory_order_relaxed is not providing that synchronization... memory_order_acquire / memory_order_release would. memory_order_relaxed没有提供同步... memory_order_acquire / memory_order_release会。

This code is completely broken. 这段代码完全被破坏了。

The only reason this appears to work is that current compilers aren't very aggressive with reordering across atomic operations and x86 processors have pretty strong guarantees. 这看起来有效的唯一原因是当前编译器对原子操作的重新排序不是很积极,x86处理器有很强的保证。

The first problem is that without synchronization, there is no guarantee that the client of this data structure will even see the fields of the node object to be initialized. 第一个问题是没有同步,不能保证该数据结构的客户端甚至会看到要初始化的节点对象的字段。 The next issue is that without synchronization, the push operation can read arbitrarily old values for the head's tag. 下一个问题是,如果没有同步,推送操作可以读取头部标签的任意旧值。

We have developed a tool, CDSChecker, that simulates most behaviors that the memory model allows. 我们开发了一个工具CDSChecker,它模拟了内存模型允许的大多数行为。 It is open source and free. 它是开源和免费的。 Run it on your data structure to see some interesting executions. 在您的数据结构上运行它以查看一些有趣的执行。

Proving anything about code that utilizes relaxed atomics is a big challenge at this point. 在这一点上,证明利用轻松原子的代码是一个很大的挑战。 Most proof methods break down because they are typically inductive in nature, and you don't have an order to induct on. 大多数证明方法都会被破坏,因为它们通常具有归纳性,并且您没有订单可以导入。 So you get out of thin air read issues... 所以你可以凭空阅读问题......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM