rcu_read_lock和x86-64内存顺序

Question

On a preemptible SMP kernel, rcu_read_lock compiles the following: 在可抢占式SMP内核上， rcu_read_lock编译以下内容：

current->rcu_read_lock_nesting++;
barrier();

With barrier being a compiler directive that compiles to nothing. barrier是一个编译器指令，可编译为空。

So, according to Intel's X86-64 memory ordering white paper: 因此，根据英特尔的X86-64内存订购白皮书：

Loads may be reordered with older stores to different locations 可能会将旧商店的货物重新排序到其他位置

why is the implementation actually OK? 为什么实施实际上可以？

Consider the following situation: 请考虑以下情况：

rcu_read_lock();
read_non_atomic_stuff();
rcu_read_unlock();

What prevents read_non_atomic_stuff from "leaking" forward past rcu_read_lock , causing it to run concurrently with the reclamation code running in another thread? 是什么阻止read_non_atomic_stuff从“ rcu_read_lock ”向前“泄漏”，导致它与在另一个线程中运行的回收代码同时运行？

Answer 1

For observers on other CPUs, nothing prevents this. 对于其他CPU上的观察者，没有什么可以阻止这一点。 You're right, StoreLoad reordering of the store part of ++ can make it globally visible after some of your loads. 没错， ++商店部分的StoreLoad重新排序可以使它在某些加载后在全局可见。

Thus we can conclude that current->rcu_read_lock_nesting is only ever observed by code running on this core, or that has remotely triggered a memory barrier on this core by getting scheduled here, or with a dedicated mechanism for getting all cores to execute a barrier in a handler for an inter-processor interrupt (IPI). 因此，我们可以得出结论，只有当前在此内核上运行的代码才会观察到current->rcu_read_lock_nesting ，或者通过在此处调度而远程触发了该内核上的内存屏障，或者使用了一种专用机制来使所有内核在该内核中执行屏障。处理器间中断（IPI）的处理程序。 eg similar to the membarrier() user-space system call. 例如，类似于membarrier()用户空间系统调用。

If this core starts running another task, that task is guaranteed to see this task's operations in program order. 如果此核心开始运行另一个任务，则可以确保该任务按程序顺序查看此任务的操作。 (Because it's on the same core, and a core always sees its own operations in order.) Also, context switches might involve a full memory barrier so tasks can be resumed on another core without breaking single-threaded logic. （因为它在同一个内核上，并且一个内核总是按顺序查看其自身的操作。）而且，上下文切换可能涉及完整的内存屏障，因此可以在不破坏单线程逻辑的情况下在另一个内核上恢复任务。 (That would make it safe for any core to look at rcu_read_lock_nesting when this task / thread is not running anywhere.) （这可以使任何内核在此任务/线程不在任何地方运行时安全地查看rcu_read_lock_nesting 。）

Notice that the kernel starts one RCU task per core of your machine ; 请注意，内核在您的计算机的每个内核上启动一个RCU任务 ； eg ps output shows [rcuc/0] , [rcuc/1] , ..., [rcu/7] on my 4c8t quad core. 例如， ps输出在我的4c8t四核上显示[rcuc/0] ， [rcuc/1] ，...， [rcu/7] 。 Presumably they're an important part of this design that lets readers be wait-free with no barriers. 大概它们是该设计的重要组成部分，它使读者可以毫无障碍地等待。

I haven't looked into full details of RCU, but one of the "toy" examples in https://www.kernel.org/doc/Documentation/RCU/whatisRCU.txt is "classic RCU" that implements synchronize_rcu() as for_each_possible_cpu(cpu) run_on(cpu); 我还没有研究RCU的全部细节，但是https://www.kernel.org/doc/Documentation/RCU/whatisRCU.txt中的“玩具”示例之一是实现了synchronize_rcu() “经典RCU” for_each_possible_cpu(cpu) run_on(cpu); , to get the reclaimer to execute on every core that might have done an RCU operation (ie every core). ，以使回收器在可能已执行RCU操作的每个内核（即每个内核）上执行。 Once that's done, we know that a full memory barrier must have happened in there somewhere as part of the switching. 完成此操作后，我们知道在切换过程中某个地方一定已经发生了内存不足的情况。

So yes, RCU doesn't follow the classic method where you'd need a full memory barrier (including StoreLoad) to make the core wait until the first store was visible before doing any reads. 因此，是的，RCU并没有遵循经典的方法，在这种方法中，您需要一个完整的内存屏障（包括StoreLoad）来使核心等待直到第一个存储区可见之后才能进行任何读取。 RCU avoids the overhead of a full memory barrier in the read path. RCU避免了读取路径中完整内存屏障的开销。 This is one of the major attractions for it, besides the avoidance of contention. 除了避免争用外，这是它的主要吸引力之一。

rcu_read_lock和x86-64内存顺序

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-04-09 03:52:54

rcu_read_lock和x86-64内存顺序

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-04-09 03:52:54

解决方案1
2 已采纳 2019-04-09 03:52:54