c++ 无锁队列实现单生产者单消费者

Question

I tried a lock free single producer-single consumer implementation link .我尝试了一个无锁的单一生产者-单一消费者实现链接。 The implementation uses list as the underlying data structure and rely on the fact that only producer thread modifies the list.The consumer thread moves the head variable if _head is not equal to _tail.该实现使用列表作为底层数据结构，并依赖于只有生产者线程修改列表这一事实。如果 _head 不等于 _tail，则消费者线程移动 head 变量。

int produced_count_, consumed_count_;
std::list<int> data_queue_;
std::list<int>::iterator head_, tail_;    

void ProducerConumer::produce() {
    static int count = 0;
    data_queue_.push_back(int(count++));
    ++produced_count_;
    tail_ = data_queue_.end();
    data_queue_.erase(data_queue_.begin(), head_);
}

bool ProducerConumer::consume() {
    auto it = head_;
    ++it;
    if(it != tail_) {
        head_ = it;
        ++consumed_count_;
        int t = *it;
        return true;
    } 
    
    return false;
}

At any point head iterator points to a value that has already been read.在任何时候，头迭代器都指向一个已被读取的值。

As there is no synchronized here, I was under the impression that the implementation would not work as the writes by one thread might not be visible to the other thread.由于这里没有同步，我的印象是该实现不会工作，因为一个线程的写入可能对另一个线程不可见。 But when I tested my code producer and consumer always produced/consumed the same number of units.但是当我测试我的代码时，生产者和消费者总是生产/消费相同数量的单位。 Can someone explain how this code can work without explicit synchronization?有人可以解释这段代码如何在没有显式同步的情况下工作吗？ ( I did not expect changes to tail_ and head_ variable visible to the other thread) （我没想到对其他线程可见的 tail_ 和 head_ 变量的更改）

Code to control producer/consumer thread is as follows控制生产者/消费者线程的代码如下

consumer_thread_ = std::thread([this]() {
set_cpu_affinity(0);
std::chrono::milliseconds start_time = current_time();
while((current_time() - start_time) < std::chrono::milliseconds(150)) {
        this->consume();
    }
    std::cout << "Data queue size from consumer is " << data_queue_.size() << " time " << current_time().count() << "\n";
});

producer_thread_ = std::thread([this]() {
    set_cpu_affinity(7);
    std::chrono::milliseconds start_time = current_time();
    while((current_time() - start_time) < std::chrono::milliseconds(100)) {
        this->produce();
    }
    std::this_thread::sleep_for(std::chrono::milliseconds(100));
    std::cout << "Data queue size from producer is " << data_queue_.size() << " time " << current_time().count() << "\n";
});

I ensure that the producer thread outlives consumer thread by adding the sleep_for at the end of the producer thread.我通过在生产者线程末尾添加 sleep_for 来确保生产者线程比消费者线程长寿。

BTW, here is a dissection of the implementation where Herb Sutter discussed what's wrong with it link .顺便说一句，这里是 Herb Sutter 讨论它链接有什么问题的实现的剖析。 But he never talked about whether changes to tail_ and head_ are visible to other thread.但是他从来没有谈到对 tail_ 和 head_ 的更改是否对其他线程可见。

Answer 1

Debug builds will often "happen to work" especially on x86 because the constraints that puts on code-gen block compile-time reordering, and x86 hardware blocks most run-time reordering.调试版本通常会“碰巧工作”，尤其是在 x86 上，因为对代码生成块编译时重新排序施加的约束，而 x86 硬件阻止大多数运行时重新排序。

If you compile in debug mode, memory accesses will happen in program order, and the compiler won't keep values in registers across statements .如果在调试模式下编译，memory 次访问将按程序顺序发生，并且编译器不会跨语句将值保存在寄存器中。 (A bit like to volatile, which can be used to roll your own atomics; but don't: When to use volatile with multi threading? ). （有点像 volatile，它可以用来滚动你自己的原子；但不要：何时将 volatile 与多线程一起使用？）。 Still, cache is coherent, simply loading and storing is asm is sufficient for global visibility (in some order).仍然，缓存是连贯的，简单的加载和存储是 asm 就足以实现全局可见性（以某种顺序）。

They'll be atomic because they're int sized and aligned, and the compiler does them with a single instruction because it's not a DeathStation 9000. Naturally aligned int loads and stores are atomic in asm on normal machines like x86 , not guaranteed in C. ( https://lwn.net/Articles/793253/ )它们将是原子的，因为它们是int大小和对齐的，编译器用一条指令完成它们，因为它不是 DeathStation 9000。自然对齐的int加载和存储在普通机器上的 asm 中是原子的，如 x86 ，在 C 中不能保证. ( https://lwn.net/Articles/793253/ )

If you only test on x86, the hardware memory model gives you program-order plus a store buffer, so you effectively get the same asm as you would with std::atomic memory_order_acquire and release .如果您仅在 x86 上进行测试，则硬件 memory model 会为您提供程序顺序和存储缓冲区，因此您可以有效地获得与std::atomic memory_order_acquire和release相同的 asm。 (Because of debug builds not reordering between statements). （因为调试构建不会在语句之间重新排序）。

C++ undefined behaviour (including this data-race UB) does not mean "guaranteed to fail or crash" - that's what makes it so nasty and why testing isn't sufficient to find it. C++ 未定义的行为（包括此数据争用 UB）并不意味着“保证会失败或崩溃”——这就是它如此令人讨厌的原因，也是测试不足以找到它的原因。

Compile with optimization enabled and you might see big problems, depending on compile-time reordering and hoisting choices.在启用优化的情况下进行编译，您可能会遇到大问题，具体取决于编译时重新排序和提升选择。 eg if the compiler can keep a variable in a register for the duration of a loop, it'll never re-read from cache/memory and never see what the other thread stored.例如，如果编译器可以在循环期间将变量保存在寄存器中，它将永远不会从缓存/内存中重新读取，也永远不会看到其他线程存储的内容。 Among other problems.除其他问题外。 Multithreading program stuck in optimized mode but runs normally in -O0 多线程程序卡在优化模式但在-O0下正常运行

Code isn't very useful if it only happens to work in "training wheels" mode because you haven't told the compiler how to optimize it safely.如果代码只是碰巧在“训练轮”模式下工作，那么它就不是很有用，因为您没有告诉编译器如何安全地优化它。 (By using std::atomic for example). （例如，通过使用std::atomic ）。

I haven't looked in a lot of detail at your code, but I don't think you have any variables that are modified by both threads.我没有详细查看您的代码，但我认为您没有任何变量被两个线程修改。 In a circular-buffer queue, you often have a ++ increment on a variable that's RMWed by the producer but read-only by the consumer.在循环缓冲区队列中，您通常会在生产者 RMW 但消费者只读的变量上有一个++增量。 And vice versa for a read position. Those don't need to be an atomic RMW, only an atomic store so the other thread's atomic load can see a not-torn value.反之亦然，读取 position。这些不需要是原子 RMW，只需是原子存储，以便其他线程的原子加载可以看到未撕裂的值。 That happens for "naturally" in asm.这发生在 asm 中的“自然”。

Here I think you're just storing a new head, and the other thread is just reading that.在这里我认为你只是在存储一个新的头部，而另一个线程只是在读取它。

In a linked list, deallocation can be a problem especially with multiple consumers.在链表中，释放可能是一个问题，尤其是对于多个消费者。 You can't free or recycle the node until you're sure that no threads have a pointer to it.在确定没有线程有指向它的指针之前，您不能释放或回收该节点。 Garbage collected languages / runtimes can use linked lists for lockless queues much more easily because the GC already has to handle the same checking for objects in general.垃圾收集语言/运行时可以更轻松地使用无锁队列的链表，因为 GC 已经必须处理一般对象的相同检查。

So make sure you get this right if rolling your own;因此，如果您自己动手，请确保您做对了； it can be tricky.这可能很棘手。 Although as long as you only link a node into the linked list after it's constructed, and there's only one consumer, you never have visibility of half-constructed nodes.虽然只要你只在一个节点构建后将它链接到链表中，并且只有一个消费者，你永远不会看到半构建的节点。 And you never have one thread deallocating a node that another thread might wake up and continue reading.而且您永远不会让一个线程取消分配另一个线程可能唤醒并继续读取的节点。

Answer 2

The article says:文章说：

Another issue is using the standard std::list.另一个问题是使用标准的 std::list。 While the article mentions that it is the developer responsibility to check that the reading/writing std::list::iterator is atomic, this turns out to be too restrictive.虽然文章提到检查读/写 std::list::iterator 是原子的是开发人员的责任，但事实证明这过于严格。 While gcc/MSVC++2003 has 4-byte iterators, the MSVC++2005 has 8-byte iterators in Release Mode and 12-byte iterators in the Debug Mode. gcc/MSVC++2003 有 4 字节迭代器，而 MSVC++2005 在发布模式下有 8 字节迭代器，在调试模式下有 12 字节迭代器。

That is your responsibility to ensure that iterators are atomic.你有责任确保迭代器是原子的。 That is not the case for std::list . std::list不是这种情况。 There are no guarantees on the read/write operations from different threads unless you explicitly specify data as atomic.除非您将数据明确指定为原子数据，否则无法保证来自不同线程的读/写操作。 However even if "undefined behavior" means "nasal demons", there is nothing wrong if these demons are observed as a consistent synchronization.然而，即使“未定义的行为”意味着“鼻恶魔”，如果这些恶魔被观察为一致的同步也没有错。

c++ 无锁队列实现单生产者单消费者

问题描述

2 个解决方案

解决方案1
1 2020-12-27 07:36:00

解决方案2
0 2020-12-27 07:33:45

c++ 无锁队列实现单生产者单消费者

问题描述

2 个解决方案

解决方案1 1 2020-12-27 07:36:00

解决方案2 0 2020-12-27 07:33:45

解决方案1
1 2020-12-27 07:36:00

解决方案2
0 2020-12-27 07:33:45