简体   繁体   English

使用atomic锁定自由单个生成器多个使用者数据结构

[英]lock free single producer multiple consumer data struct using atomic

I have sample code like below recently (real code is much more complicated). 我最近有类似下面的示例代码(实际代码要复杂得多)。 After watched Hans Boehm's cppcon16 talk on atomic, I am bit anxious whether my code works. 看过Hans Boehm关于原子的cppcon16谈话后,我有点担心我的代码是否有效。

produce is called by a single producer thread, and consume is called by multiple consumer threads. produce由单个生产者线程调用,而consume由多个消费者线程调用。 Producer only updates data in sequence number like 2, 4, 6, 8,... , but sets to odd sequence num like 1, 3, 5, 7, ... before update data to indicate data might be dirty. 生产者只更新序列号中的数据,如2,4,6,8,...,但在更新数据之前设置为奇数序列号,如1,3,5,7 ......,以指示数据可能是脏的。 Consumers tries to fetch data in same sequence (2, 4, 6, ...) as well. 消费者也试图以相同的顺序(2,4,6,...)获取数据。

Consumer double checks the sequence num after read to make sure data are good (not updated by producer during read). 消费者在读取后仔细检查序列号以确保数据良好(在读取期间不由生产者更新)。

I think my code works fine on x86_64 (my target platform) because x86_64 does not reorder stores with other stores, or loads with stores or loads, but I suspect it's wrong on other platforms. 我认为我的代码在x86_64(我的目标平台)上运行正常,因为x86_64不会对其他商店重新排序商店,或者加载商店或加载,但我怀疑它在其他平台上是错误的。

Am I correct that data assignment (in produce) can be moved to above 'store(n-1)' so consumer reads corrupted data but the t == t2 still succeeds? 我是否正确,数据分配(在产品中)可以移动到'store(n-1)'以上,以便消费者读取损坏的数据,但t == t2仍然成功?

struct S 
{
    atomic<int64_t> seq;
    // data members of primitive type int, double etc    
    ...
};

S s;

void produce(int64_t n, ...) // ... for above data members
{
    s.seq.store(n-1, std::memory_order_release); // indicates it's working on data members

    // assign data members of s
    ...

    s.seq.store(n, std::memory_order_release); // complete updating
}

bool consume(int64_t n, ...) // ... for interested fields passed as reference
{
    auto t = s.load(std::memory_order_acquire);

    if (t == n)
    {
        // read fields
        ...

        auto t2 = s.load(std::memory_order_acquire);
        if (t == t2)
            return true;
    }        

    return false;
}

Compile-time reordering can still bite you when targeting x86, because the compiler optimizes to preserve the behaviour of the program on the C++ abstract machine, not any stronger architecture-dependent behaviour. 编译时重新排序仍然会在定位x86时咬你,因为编译器会优化以保留程序在C ++抽象机器上的行为,而不是任何更强大的依赖于体系结构的行为。 Since we want to avoid memory_order_seq_cst , reordering is allowed. 由于我们要避免使用memory_order_seq_cst ,因此允许重新排序。

Yes, your stores can reorder as you suggest. 是的,您的商店可以按照您的建议重新排序。 Your loads can also reorder with the t2 load, since an acquire-load is only a one-way barrier . 您的负载也可以使用t2负载重新排序,因为获取负载只是一个单向障碍 It would be legal for a compiler to optimize away the t2 check entirely. 编译器完全优化t2检查是合法的。 If a reordering is possible, the compiler is allowed to decide that it's what always happens and apply the as-if rule to make more efficient code. 如果可以重新排序,则允许编译器确定它始终发生的情况并应用as-if规则来生成更高效的代码。 (Current compilers usually don't, but this is definitely allowed by the current standard as written. See the conclusion of a discussion about this, with links to standards proposals .) (目前的编译器通常没有,但是现在的标准绝对允许这样做。请参阅有关此问题的讨论结论,并提供标准提案的链接 。)

Your options for preventing reordering are: 您防止重新排序的选项是:

  • Make all the data-member stores/loads atomic with release and acquire semantics. 使所有数据成员存储/加载原子化并释放并获取语义。 (The acquire-load of the last data member would keep the t2 load from being done first.) (最后一个数据成员的获取负载将使t2负载首先完成。)
  • Use barriers (aka fences) to order all the non-atomic stores and non-atomic loads as a group. 使用障碍(也称为栅栏)将所有非原子存储和非原子载荷作为一个组进行排序。

    As Jeff Preshing explains, a mo_release fence isn't the same thing as a mo_release store , and is the kind of bidirectional barrier we need. 正如Jeff Preshing所解释的那样, mo_release fencemo_release store不同 ,它是我们需要的双向屏障。 std::atomic just recycles the std::mo_ names instead of giving different names for the fences. std :: atomic只是回收std :: mo_名称而不是为围栏指定不同的名称。

    (BTW, the non-atomic stores/loads should really be atomic with mo_relaxed , because it's technically Undefined Behaviour to read them at all while they might be in the process of being rewritten, even if you decide not to look at what you read.) (顺便说一句,非原子存储/加载应该是mo_relaxed原子,因为技术上未定义的行为在它们可能正在被重写的过程中完全读取它们,即使你决定不看你读的内容。 )


void produce(int64_t n, ...) // ... for above data members
{
    /*********** changed lines ************/
    std::atomic_signal_fence(std::memory_order_release);  // compiler-barrier to make sure the compiler does the seq store as late as possible (to give the reader more time with it valid).
    s.seq.store(n-1, std::memory_order_relaxed);          // changed from release
    std::atomic_thread_fence(std::memory_order_release);  // StoreStore barrier prevents reordering of the above store with any below stores.  (It's also a LoadStore barrier)
    /*********** end of changes ***********/

    // assign data members of s
    ...

    // release semantics prevent any preceding stores from being delayed past here
    s.seq.store(n, std::memory_order_release); // complete updating
}



bool consume(int64_t n, ...) // ... for interested fields passed as reference
{
    if (n == s.seq.load(std::memory_order_acquire))
    {
        // acquire semantics prevent any reordering with following loads

        // read fields
        ...

    /*********** changed lines ************/
        std::atomic_thread_fence(std::memory_order_acquire);  // LoadLoad barrier (and LoadStore)
        auto t2 = s.seq.load(std::memory_order_relaxed);    // relaxed: it's ordered by the fence and doesn't need anything extra
        // std::atomic_signal_fence(std::memory_order_acquire);  // compiler barrier: probably not useful on the load side.
    /*********** end of changes ***********/
        if (n == t2)
            return true;
    }

    return false;
}

Notice the extra compiler-barrier (signal_fence only affects compile-time reordering) to make sure the compiler doesn't merge the second store from one iteration with the first store from the next iteration, if this is run in a loop. 注意额外的编译器屏障(signal_fence仅影响编译时重新排序)以确保编译器不会将一次迭代中的第二个存储与下一次迭代中的第一个存储合并(如果这是在循环中运行)。 Or more generally, to make sure the store that invalidates the region is done as late as possible, to reduce false positives. 或者更一般地,确保尽可能晚地完成使区域无效的商店,以减少误报。 (Probably not necessary with real compilers, and with plenty of code between calls to this function. But signal_fence never compiles to any instructions, and seems like a better choice than keeping the first store as mo_release . On architectures where a release-store the thread-fence both compile to extra instructions, a relaxed store avoids having two separate barrier instructions.) (可能没有必要使用真正的编译器,并且在调用此函数之间有大量代码。但是signal_fence从不编译任何指令,并且似乎比将第一个存储保存为mo_release更好的选择。在发布存储线程的架构上 - 编译到额外的指令,轻松的存储避免有两个单独的屏障指令。)

I was also worried about the possibility of the first store reordering with the release-store from the previous iteration. 我还担心第一个商店可能会重新排序上一次迭代中的发布商店。 But I don't think that can ever happen, because both stores are to the same address. 但我不认为这种情况会发生,因为两家商店的地址相同。 (At compile-time, maybe the standard allows a hostile compiler to do it, but any sane compiler would instead just not do one of the stores at all if it thought one could pass the other.) At run-time on a weakly-ordered architecture, I'm not sure if stores to the same address can ever become globally visible out of order. (在编译时,也许标准允许恶意编译器执行此操作,但任何理智的编译器都会根本不执行其中一个存储,如果它认为可以传递另一个存储库。)在运行时弱的情况下 - 有序的架构,我不确定同一地址的商店是否可能无序地全局可见。 This shouldn't be a problem in real life since the producer presumably isn't called back-to-back. 这应该不是现实生活中的问题,因为生产者可能不会背靠背地召唤。


BTW, the synchronization technique you're using is a Seqlock , but with only a single-writer. 顺便说一句, 你正在使用的同步技术是Seqlock ,但只有一个单一编写器 You only have the sequence part, not the lock part to synchronize separate writers. 您只有序列部分,而不是锁定部分来同步单独的编写器。 In a multi-writer version, writers would take the lock before reading/writing the sequence numbers and data. 在多写程序版本中,编写者会在读取/写入序列号和数据之前获取锁定。 (And instead of having the seq no as a function arg, you'd read it from the lock). (而不是将seq no作为函数arg,你从锁中读取它)。

C++ standards-discussion paper N4455 (about compiler-optimizations of atomics, see the second half of my answer on Can num++ be atomic for 'int num'? ) uses it as an example. C ++标准 - 讨论文件N4455 (关于原子的编译器优化,请参阅我对Can num ++的答案的后半部分是'int num'的原子? )以它为例。

Instead of a StoreStore fence, they use release-stores for the data items in the writer. 而不是StoreStore围栏,他们使用发布商店作为编写器中的数据项。 (With atomic data items, as I mentioned is required for this to really be correct). (对于原子数据项,正如我所提到的,这需要真正正确)。

void writer(T d1, T d2) {
  unsigned seq0 = seq.load(std::memory_order_relaxed);  // note that they read the current value because it's presumably a multiple-writers implementation.
  seq.store(seq0 + 1, std::memory_order_relaxed);
  data1.store(d1, std::memory_order_release);
  data2.store(d2, std::memory_order_release);
  seq.store(seq0 + 2, std::memory_order_release);
}

They talk about letting the reader's second load of the sequence number potentially reorder with later operations, if it's profitable for the compiler to do so, and using t2 = seq.fetch_add(0, std::memory_order_release) in the reader as a potential way to get a load with release semantics. 他们讨论让读者第二次加载序列号可能会在以后的操作中重新排序,如果编译器这样做是有利的,并且在读者中使用t2 = seq.fetch_add(0, std::memory_order_release)作为一种潜在的方式获取具有发布语义的加载。 With current compilers, I would not recommend that; 对于当前的编译器,我建议这样做; you're likely to get a lock ed operation on x86 where the way I suggested above doesn't have any (or any actual barrier instructions, because only full-barrier seq_cst fences need an instruction on x86). 你可能会在x86上得到一个lock操作,我上面提到的方式没有任何(或任何实际的屏障指令,因为只有全屏障seq_cst栅栏需要x86上的指令)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM