读者/作者锁……没有读者锁？

Question

I get the feeling this may be a very general and common situation for which a well-known no-lock solution exists.我觉得这可能是一种非常普遍和常见的情况，其中存在众所周知的无锁解决方案。

In a nutshell, I'm hoping there's approach like a readers/writer lock, but that doesn't require the readers to acquire a lock and thus can be better average performance.简而言之，我希望有像读者/作者锁这样的方法，但这并不要求读者获得锁，因此可以获得更好的平均性能。

Instead there'd be some atomic operations (128-bit CAS) for a reader, and a mutex for a writer.相反，读者会使用一些原子操作（128 位 CAS），而编写者会使用互斥锁。 I'd have two copies of the data structure, a read-only one for the normally-successful queries, and an identical copy to be update under mutex protection.我将拥有数据结构的两个副本，一个用于正常成功查询的只读副本，以及一个在互斥锁保护下更新的相同副本。 Once the data has been inserted into the writable copy, we make it the new readable copy.将数据插入可写副本后，我们将其设为新的可读副本。 The old readable copy then gets inserted in turn, once all the pending readers have finished reading it, and the writer spins on the number of readers left until its zero, then modifies it in turn, and finally releases the mutex.旧的可读副本然后依次插入，一旦所有待处理的读者都读完它，写者旋转剩余的读者数量直到其为零，然后依次修改它，最后释放互斥体。

Or something like that.或类似的东西。

Anything along these lines exist?沿着这些思路存在吗？

Answer 1

If your data fits in a 64-bit value, most systems can cheaply read/write that atomically, so just use std::atomic<my_struct> .如果您的数据适合 64 位值，则大多数系统都可以以原子方式廉价地读取/写入，因此只需使用std::atomic<my_struct> 。

For smallish and/or infrequently-written data , there are a couple ways to make readers truly read-only on the shared data, not having to do any atomic RMW operations on a shared counter or anything.对于小的和/或不经常写入的数据，有几种方法可以使读取器对共享数据真正只读，而不必对共享计数器或任何东西执行任何原子 RMW 操作。 This allows read-side scaling to many threads without readers contending with each other (unlike a 128-bit atomic read on x86 using lock cmpxchg16b , or taking a RWlock).这允许读取端扩展到多个线程，而无需读取器相互竞争（与 x86 上使用lock cmpxchg16b或采用 RWlock 的 128 位原子读取不同）。

Ideally just an extra level of indirection via an atomic<T*> pointer (RCU), or just an extra load + compare-and-branch (SeqLock);理想情况下，只是通过atomic<T*>指针 (RCU) 的额外间接级别，或者只是额外的负载 + 比较和分支 (SeqLock)； no atomic RMWs or memory barriers stronger than acq/rel or anything else in the read side.没有原子 RMW 或 memory 屏障比读取端的 acq/rel 或其他任何东西都强。

This can be appropriate for data that's read very frequently by many threads, eg a timestamp updated by a timer interrupt but read all over the place.这可能适用于许多线程非常频繁地读取的数据，例如由计时器中断更新但在整个地方读取的时间戳。 Or a config setting that typically never changes.或者一个通常永远不会改变的配置设置。

If your data is larger and/or changes more frequently, one of the strategies suggested in other answers that requires a reader to still take a RWlock on something or atomically increment a counter will be more appropriate.如果您的数据更大和/或更频繁地更改，则其他答案中建议的策略之一要求读者仍然对某事采取 RWlock 或原子地增加计数器将更合适。 This won't scale perfectly because each reader still needs to get exclusive ownership of the shared cache line containing lock or counter so it can modify it, but there's no such thing as a free lunch.这不会完美地扩展，因为每个读者仍然需要获得包含锁或计数器的共享缓存行的独占所有权，以便它可以修改它，但是没有免费午餐这样的东西。

RCU控制单元

It sounds like you're half-way to inventing RCU (Read Copy Update) where you update a pointer to the new version.听起来您正在发明 RCU （读取复制更新），您可以在其中更新指向新版本的指针。

But remember a lock-free reader might stall after loading the pointer, so you have a deallocation problem.但请记住，无锁读取器可能会在加载指针后停止，因此您遇到了释放问题。 This is the hard part of RCU.这是 RCU 的难点。 In a kernel it can be solved by having sync points where you know that there are no readers older than some time t, and thus can free old versions.在 kernel 中，可以通过设置同步点来解决，在该同步点您知道没有超过某个时间 t 的阅读器，因此可以释放旧版本。 There are some user-space implementations.有一些用户空间实现。 https://en.wikipedia.org/wiki/Read-copy-update and https://lwn.net/Articles/262464/ . https://en.wikipedia.org/wiki/Read-copy-update和https://lwn.net/Articles/262464/ 。

For RCU, the less frequent the changes, the larger a data structure you can justify copying.对于 RCU，更改的频率越低，您可以证明复制的数据结构越大。 eg even a moderate-sized tree could be doable if it's only ever changed interactively by an admin, while readers are running on dozens of cores all checking something in parallel.例如，即使是一个中等大小的树，如果它只由管理员交互更改，而读者在数十个内核上运行，所有这些内核都在并行检查某些东西，那么它也是可行的。 eg kernel config settings are one thing where RCU is great in Linux.例如 kernel 配置设置是 RCU 在 Linux 中很棒的一件事。

SeqLock序列锁

If your data is small (eg a 64-bit timestamp on a 32-bit machine), another good option is a SeqLock.如果您的数据很小（例如，32 位机器上的 64 位时间戳），另一个不错的选择是 SeqLock。 Readers check a sequence counter before/after non-atomic copy of the data into a private buffer.读取器在将数据非原子复制到私有缓冲区之前/之后检查序列计数器。 If the sequence counters match, we know there wasn't tearing.如果序列计数器匹配，我们就知道没有撕裂。 (Writers mutually exclude each with a separate mutex). （编写器使用单独的互斥体相互排除每个）。 Implementing 64 bit atomic counter with 32 bit atomics / how to implement a seqlock lock using c++11 atomic library . 使用 32 位原子实现 64 位原子计数器/ 如何使用 c++11 原子库实现 seqlock 锁。

It's a bit of a hack in C++ to write something that can compile efficiently to a non-atomic copy that might have tearing, because inevitably that's data-race UB.在 C++ 中编写一些可以有效编译为可能会撕裂的非原子副本的东西有点像 hack，因为这不可避免地是数据竞争 UB。 (Unless you use std::atomic<long> with mo_relaxed for each chunk separately, but then you're defeating the compiler from using movdqu or something to copy 16 bytes at once.) （除非您对每个块分别使用带有mo_relaxed的std::atomic<long> ，但是您正在使编译器无法使用movdqu或其他东西一次复制 16 个字节。）

A SeqLock makes the reader copy the whole thing (or ideally just load it into registers) every read so it's only ever appropriate for a small struct or 128-bit integer or something. SeqLock 使读取器在每次读取时都复制整个内容（或者理想情况下只是将其加载到寄存器中），因此它仅适用于小型结构或 128 位 integer 或其他东西。 But for less than 64 bytes of data it can be quite good, better than having readers use lock cmpxchg16b for a 128-bit datum if you have many readers and infrequent writes.但是对于少于 64 字节的数据，它可能非常好，如果您有很多读取器和不频繁的写入，则比让读取器对 128 位数据使用lock cmpxchg16b更好。

It's not lock-free, though: a writer that sleeps while modifying the SeqLock could get readers stuck retrying indefinitely.但是，它不是无锁的：在修改 SeqLock 时休眠的编写器可能会让读者无限期地重试。 For a small SeqLock the window is small, and obviously you want to have all the data ready before you do the first sequence-counter update to minimize the chance for an interrupt pausing the writer in mid update.对于小型 SeqLock，window 很小，显然您希望在执行第一次序列计数器更新之前准备好所有数据，以最大限度地减少在更新过程中中断暂停写入器的机会。

The best case is when there's only 1 writer so it doesn't have to do any locking;最好的情况是只有 1 个写者，所以它不必做任何锁定； it knows nothing else will be modifying the sequence counter.它知道没有其他东西会修改序列计数器。

Answer 2

What you're describing is very similar to double instance locking and left-right concurrency control .您所描述的与双实例锁定和左右并发控制非常相似。

In terms of progress guarantees, the difference between the two is that the former is lock-free for readers while the latter is wait-free.在进度保证方面，两者的区别在于前者对读者是无锁的，而后者是无等待的。 Both are blocking for writers.两者都在阻止作家。

Answer 3

It turns out the two-structure solution I was thinking of has similarities to http://concurrencyfreaks.blogspot.com/2013/12/left-right-concurrency-control.html事实证明，我正在考虑的双结构解决方案与http://concurrencyfreaks.blogspot.com/2013/12/left-right-concurrency-control.html有相似之处

Here's the specific data structure and pseudocode I had in mind.这是我想到的具体数据结构和伪代码。

We have two copies of some arbitrary data structure called MyMap allocated, and two pointers out of a group of three pointers point to these two.我们分配了一些名为 MyMap 的任意数据结构的两个副本，并且一组三个指针中的两个指针指向这两个。 Initially, one is pointed to by achReadOnly[0].pmap and the other by pmapMutable.最初，一个由 achReadOnly[0].pmap 指向，另一个由 pmapMutable 指向。

A quick note on achReadOnly: it has a normal state and two temporary states.关于 achReadOnly 的简要说明：它有一个正常的 state 和两个临时状态。 The normal state will be (WLOG for cell 0/1):正常的 state 将是（单元格 0/1 的 WLOG）：

achReadOnly = { { pointer to one data structure, number of current readers },
                { nullptr, 0 } }
pmapMutable = pointer to the other data structure

When we've finished mutating "the other," we store it in the unused slot of the array as it is the next-generation read-only and it's fine for readers to start accessing it.当我们完成对“另一个”的变异后，我们将它存储在数组的未使用槽中，因为它是下一代只读的，读者可以开始访问它。

achReadOnly = { { pointer to one data structure, number of old readers },
                { pointer to the other data structure, number of new readers } }
pmapMutable = pointer to the other data structure

The writer then clears the pointer to "the one", the previous-generation readonly, forcing readers to go to the next-generation one.然后，作者清除指向“the one”的指针，即上一代只读，迫使读者将 go 指向下一代。 We move that to pmapMutable.我们将其移至 pmapMutable。

achReadOnly = { { nullptr, number of old readers },
                { pointer to the other data structure, number of new readers } }
pmapMutable = pointer to the one data structure

The writer then spins for number of old readers to hit one (itself) at which point it can receive the same update.然后编写器旋转一些老读者来命中一个（本身），此时它可以接收相同的更新。 That 1 is overwritten with 0 to clean up in preparation to move forward.该 1 被 0 覆盖以清理以准备继续前进。 Though in fact it could be left dirty as it won't be referred to before being overwritten.虽然实际上它可能会被弄脏，因为它在被覆盖之前不会被引用。

struct CountedHandle {
    MyMap*   pmap;
    int      iReaders;
};

// Data Structure:
atomic<CountedHandle> achReadOnly[2];
MyMap* pmapMutable;
mutex_t muxMutable;

data Read( key ) {
    int iWhich = 0;
    CountedHandle chNow, chUpdate;

    // Spin if necessary to update the reader counter on a pmap, and/or
    // to find a pmap (as the pointer will be overwritten with nullptr once
    // a writer has finished updating the mutable copy and made it the next-
    // generation read-only in the other slot of achReadOnly[].

    do {
        chNow = achReadOnly[ iWhich ];
        if ( !chNow .pmap ) {
            iWhich = 1 - iWhich;
            continue;
        }
        chUpdate = chNow;
        chNow.iReaders++;
    } while ( CAS( ach[ iWhich ], chNow, chUpdate ) fails );

    // Now we've found a map, AND registered ourselves as a reader of it atomicly.
    // Importantly, it is impossible any reader has this pointer but isn't
    // represented in that count.

    if ( data = chnow.pmap->Find( key ) ) {
        // Deregister ourselves as a reader.
        do {
            chNow = achReadOnly[ iWhich ];
            chUpdate = chNow;
            chNow.iReaders--;
        } while ( CAS( ach[ iWhich ], chNow, chUpdate ) fails );

        return data;
    }

    // OK, we have to add it to the structure.

    lock muxMutable;
    figure out data for this key
    pmapMutable->Add( key, data );

    // It's now the next-generation read-only.  Put it where readers can find it.
    achReadOnly[ 1 - iWhich ].pmap = pmapMutable;

    // Prev-generation readonly is our Mutable now, though we can't change it
    // until the readers are gone.
    pmapMutable = achReadOnly[ iWhich ].pmap;

    // Force readers to look for the next-generation readonly.
    achReadOnly[ iWhich ].pmap = nullptr;

    // Spin until all readers finish with previous-generation readonly.
    // Remember we added ourselves as reader so wait for 1, not 0.

    while ( achReadOnly[ iWhich ].iReaders > 1 }
        ;

    // Remove our reader count.
    achReadOnly[ iWhich ].iReaders = 0;

    // No more readers for previous-generation readonly, so we can now write to it.
    pmapMutable->Add( key, data );

    unlock muxMutable;

    return data;

}

Answer 4

Solution that has come to me:我遇到的解决方案：

Every thread has a thread_local copy of the data structure, and this can be queried at will without locks.每个线程都有一个thread_local的数据结构副本，可以随意查询，无需加锁。 Any time you find your data, great, you're done.任何时候你找到你的数据，很好，你已经完成了。

If you do NOT find your data, then you acquire a mutex for the master copy.如果您没有找到您的数据，那么您将获得主副本的互斥锁。

This will have potentially many new insertions in it from other threads (possibly including the data you need.).这可能会有许多来自其他线程的新插入（可能包括您需要的数据。）。 Check to see if it has your data and if not insert it.检查它是否有您的数据，如果没有插入它。

Finally, copy all the recent updates--including the entry for the data you need--to your own thread_local copy.最后，将所有最近的更新（包括您需要的数据条目）复制到您自己的thread_local副本中。 Release mutex and done.释放互斥锁并完成。

Readers can read all day long, in parallel, even when updates are happening, without locks .读者可以整天并行阅读，即使正在更新，也无需加锁。 A lock is only needed when writing, (or sometimes when catching up).仅在写入时（或有时在赶上时）才需要锁。 This general approach would work for a wide range of underlying data structures.这种通用方法适用于广泛的底层数据结构。 QED量子点

Having many thread_local indexes sounds memory-inefficient if you have lots of threads using this structure.如果您有很多线程使用此结构，那么拥有许多thread_local索引听起来内存效率低下。

However, the data found by the index, if it's read-only, need only have one copy, referred to by many indices.但是，索引找到的数据，如果是只读的，只需要一个副本，被许多索引引用。 (Luckily, that is my case.) （幸运的是，这就是我的情况。）

Also, many threads might not be randomly accessing the full range of entries;此外，许多线程可能不会随机访问所有条目； maybe some only need a few entries and will very quickly reach a final state where their local copy of the structure can find all the data needed, before it grows much.也许有些人只需要几个条目，很快就会达到最终的 state ，他们的结构的本地副本可以在它增长很多之前找到所有需要的数据。 And yet many other threads may not refer to this at all.然而，许多其他线程可能根本没有提到这一点。 (Luckily, that is my case.) （幸运的是，这就是我的情况。）

Finally, to "copy all the recent updates" it'd help if all new data added to the structure were, say, pushed onto the end of a vector so given that say you have 4000 entries in your local copy, the master copy has 4020, you can with a few machine cycles locate the 20 objects you need to add.最后，为了“复制所有最近的更新”，如果添加到结构中的所有新数据都被推到向量的末尾，那么假设你的本地副本中有 4000 个条目，主副本有4020，您可以用几个机器周期定位您需要添加的 20 个对象。 (Luckily, that is my case.) （幸运的是，这就是我的情况。）

读者/作者锁……没有读者锁？

问题描述

4 个解决方案

解决方案1
4 2020-04-15 20:19:15

RCU控制单元

SeqLock序列锁

解决方案2
3 已采纳 2020-04-15 21:19:12

解决方案3
1 2020-04-16 08:46:19

解决方案4
0 2020-04-16 04:52:28

读者/作者锁……没有读者锁？

问题描述

4 个解决方案

解决方案1 4 2020-04-15 20:19:15

RCU控制单元

SeqLock序列锁

解决方案2 3 已采纳 2020-04-15 21:19:12

解决方案3 1 2020-04-16 08:46:19

解决方案4 0 2020-04-16 04:52:28

解决方案1
4 2020-04-15 20:19:15

解决方案2
3 已采纳 2020-04-15 21:19:12

解决方案3
1 2020-04-16 08:46:19

解决方案4
0 2020-04-16 04:52:28