[英]Lock-free stack - Is this a correct usage of c++11 relaxed atomics? Can it be proven?
I've written a container for a very simple piece of data that needs to be synchronized across threads. 我为一个非常简单的数据写了一个容器,需要跨线程同步。 I want the top performance. 我想要最好的表现。 I don't want to use locks. 我不想使用锁。
I want to use "relaxed" atomics. 我想用“放松”的原子。 Partly for that little bit of extra oomph, and partly to really understand them. 部分是为了一点额外的魅力,部分是为了真正理解它们。
I've been working on this a lot, and I'm at the point where this code passes all tests I throw at it. 我一直在研究这个问题,而且我正处于这个代码通过我抛出的所有测试的地步。 That's not quite "proof" though, and so I'm wondering if there's anything I'm missing, or any other ways I can test this? 但这并不是“证据”,所以我想知道是否有任何我遗漏的东西,或者我可以测试的其他任何方式?
Here's my premise: 这是我的前提:
Here's what I'm thinking. 这就是我在想的。 "Normally", the way we reason about code that we're reading is to look at the order in which it's written. “通常”,我们对我们正在阅读的代码进行推理的方式是查看它所编写的顺序。 Memory can be read or written to "out of order", but not in a way that invalidates the correctness of the program. 内存可以被读取或写入“乱序”,但不能使程序的正确性无效。
That changes in a multi-threaded environment. 这在多线程环境中发生了变化。 That's what memory fences are for - so that we can still look at the code and be able to reason about how it's going to work. 这就是内存栅栏的用途 - 这样我们仍然可以查看代码并能够推断它是如何工作的。
So if everything can go all out-of-order here, what am I doing with relaxed atomics? 所以,如果一切都可以在这里完全失序,那么我在放松原子能做什么呢? Isn't that a bit too far? 这不是有点太远吗?
I don't think so, but that's why I'm here asking for help. 我不这么认为,但这就是我在这里寻求帮助的原因。
The compare_exchange operations themselves give a guarantee of sequential constancy with each other. compare_exchange操作本身可以保证彼此之间具有连续的恒定性。
The only other time there is read or write to an atomic is to get the head's initial value before a compare_exchange. 读取或写入原子的唯一另一个时间是在compare_exchange之前获取头部的初始值。 It is set as part of the initialization of a variable. 它被设置为变量初始化的一部分。 As far as I can tell, it would be irrelevant whether or not this operation brings back a "proper" value. 据我所知,这个操作是否带回了“适当的”值是无关紧要的。
Current code: 当前代码:
struct node
{
node *n_;
#if PROCESSOR_BITS == 64
inline constexpr node() : n_{ nullptr } { }
inline constexpr node(node* n) : n_{ n } { }
inline void tag(const stack_tag_t t) { reinterpret_cast<stack_tag_t*>(this)[3] = t; }
inline stack_tag_t read_tag() { return reinterpret_cast<stack_tag_t*>(this)[3]; }
inline void clear_pointer() { tag(0); }
#elif PROCESSOR_BITS == 32
stack_tag_t t_;
inline constexpr node() : n_{ nullptr }, t_{ 0 } { }
inline constexpr node(node* n) : n_{ n }, t_{ 0 } { }
inline void tag(const stack_tag_t t) { t_ = t; }
inline stack_tag_t read_tag() { return t_; }
inline void clear_pointer() { }
#endif
inline void set(node* n, const stack_tag_t t) { n_ = n; tag(t); }
};
using std::memory_order_relaxed;
class stack
{
public:
constexpr stack() : head_{}{}
void push(node* n)
{
node next{n}, head{head_.load(memory_order_relaxed)};
do
{
n->n_ = head.n_;
next.tag(head.read_tag() + 1);
} while (!head_.compare_exchange_weak(head, next, memory_order_relaxed, memory_order_relaxed));
}
bool pop(node*& n)
{
node clean, next, head{head_.load(memory_order_relaxed)};
do
{
clean.set(head.n_, 0);
if (!clean.n_)
return false;
next.set(clean.n_->n_, head.read_tag() + 1);
} while (!head_.compare_exchange_weak(head, next, memory_order_relaxed, memory_order_relaxed));
n = clean.n_;
return true;
}
protected:
std::atomic<node> head_;
};
What's different about this question compared to others? 与其他人相比,这个问题有什么不同? Relaxed atomics. 放松的原子。 They make a big difference to the question. 他们对这个问题产生了很大的影响。
So, what do you think? 所以你怎么看? Is there anything I'm missing? 有什么我想念的吗?
push
is broken, since you do not update node->_next
after a compareAndSwap
failure. push
已损坏,因为在compareAndSwap
失败后你不会更新node->_next
。 It's possible that the node you originally stored with node->setNext
has been popped from the top of stack by another thread when the next compareAndSwap
attempt succeeds. 当下一次compareAndSwap
尝试成功时,最初使用node->setNext
存储的node->setNext
已被另一个线程从堆栈顶部弹出。 As a result, some thread thinks it has popped a node from the stack but this thread has put it back in the stack. 其结果是,一些线程认为它已经从堆栈中弹出一个节点,但这个线程已经把它放回堆栈。 It should be: 它应该是:
void push(Node* node) noexcept
{
Node* n = _head.next();
do {
node->setNext(n);
} while (!_head.compareAndSwap(n, node));
}
Also, since next
and setNext
use memory_order_relaxed
, there's no guarantee that _head_.next()
here is returning the node most recently pushed. 此外,由于next
和setNext
使用memory_order_relaxed
,因此无法保证_head_.next()
此处返回最近推送的节点。 It's possible to leak nodes from the top of the stack. 可以从堆栈顶部泄漏节点。 The same problem obviously exists in pop
as well: _head.next()
may return a node that was previously but is no longer at the top of the stack. pop
中也存在同样的问题: _head.next()
可能会返回一个先前但不再位于堆栈顶部的节点。 If the returned value is nullptr
, you may fail to pop when the stack is not actually empty. 如果返回的值为nullptr
,则当堆栈实际上不为空时,可能无法弹出。
pop
can also have undefined behavior if two threads try to pop the last node from the stack at the same time. 如果两个线程同时尝试从堆栈中弹出最后一个节点,则pop
也可能具有未定义的行为。 They both see the same value for _head.next()
, one thread successfully completes pop. 它们都看到_head.next()
的相同值,一个线程成功完成pop。 The other thread enters the while loop - since the observed node pointer is not nullptr
- but the compareAndSwap
loop soon updates it to nullptr
since the stack is now empty. 另一个线程进入while循环 - 因为观察到的节点指针不是nullptr
- 但是compareAndSwap
循环很快将它更新为nullptr
因为堆栈现在是空的。 On the next iteration of the loop, that nullptr is dererenced to get its _next
pointer and much hilarity ensues. 在循环的下一次迭代中,该nullptr被_next
以获得其_next
指针并且随之而来的是非常欢闹。
pop
is also clearly suffering from ABA. pop
也明显患有ABA。 Two threads can see the same node at the top of the stack. 两个线程可以在堆栈顶部看到相同的节点。 Say one thread gets to the point of evaluating the _next
pointer and then blocks. 假设一个线程到达评估_next
指针然后阻塞的程度。 The other thread successfully pops the node, pushes 5 new nodes, and then pushes that original node again all before the other thread wakes. 另一个线程成功弹出节点,推送5个新节点,然后在另一个线程唤醒之前再次推送该原始节点。 That other thread's compareAndSwap
will succeed - the top-of-stack node is the same - but store the old _next
value into _head
instead of the new one. 其他线程的compareAndSwap
将成功 - 栈顶节点是相同的 - 但将旧的_next
值存储到_head
而不是新的。 The five nodes pushed by the other thread are all leaked. 另一个线程推送的五个节点都被泄露了。 This would be the case with memory_order_seq_cst
as well. 这也是memory_order_seq_cst
的情况。
Leaving to one side the difficulty of implementing the pop operation, I think memory_order_relaxed
is inadequate. 让一方面难以实现pop操作,我认为memory_order_relaxed
是不合适的。 Before pushing the node, one assumes that some value(s) will be written into to it, which will be read when the node is popped. 在推送节点之前,假设将向其写入一些值,当弹出节点时将读取该值。 You need some synchronization mechanism to ensure that the values have actually been written before they are read. 您需要一些同步机制来确保在读取值之前实际写入了值。 memory_order_relaxed
is not providing that synchronization... memory_order_acquire
/ memory_order_release
would. memory_order_relaxed
没有提供同步... memory_order_acquire
/ memory_order_release
会。
This code is completely broken. 这段代码完全被破坏了。
The only reason this appears to work is that current compilers aren't very aggressive with reordering across atomic operations and x86 processors have pretty strong guarantees. 这看起来有效的唯一原因是当前编译器对原子操作的重新排序不是很积极,x86处理器有很强的保证。
The first problem is that without synchronization, there is no guarantee that the client of this data structure will even see the fields of the node object to be initialized. 第一个问题是没有同步,不能保证该数据结构的客户端甚至会看到要初始化的节点对象的字段。 The next issue is that without synchronization, the push operation can read arbitrarily old values for the head's tag. 下一个问题是,如果没有同步,推送操作可以读取头部标签的任意旧值。
We have developed a tool, CDSChecker, that simulates most behaviors that the memory model allows. 我们开发了一个工具CDSChecker,它模拟了内存模型允许的大多数行为。 It is open source and free. 它是开源和免费的。 Run it on your data structure to see some interesting executions. 在您的数据结构上运行它以查看一些有趣的执行。
Proving anything about code that utilizes relaxed atomics is a big challenge at this point. 在这一点上,证明利用轻松原子的代码是一个很大的挑战。 Most proof methods break down because they are typically inductive in nature, and you don't have an order to induct on. 大多数证明方法都会被破坏,因为它们通常具有归纳性,并且您没有订单可以导入。 So you get out of thin air read issues... 所以你可以凭空阅读问题......
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.