简体   繁体   English

是否可以在两个 CPU 内核上同时执行两个 lock() 语句? CPU内核是否同时滴答作响?

[英]Is it possible for two lock() statemens to be executed at the same time on two CPU cores ? Do CPU cores tick at the same time?

I searched for a answer and I know at a high level how to use lock etc. in a multithreading environment.我搜索了一个答案,我知道如何在多线程环境中使用锁等。 This question has bugged me for a long time and I think I am not the only one.这个问题困扰了我很长时间,我想我不是唯一一个。 TL;DR at the end. TL;最后是 DR。

In my case I want to prevent a method which gets called from multiple threads to be executed while it is being called by another thread.在我的情况下,我想防止从多个线程调用的方法在被另一个线程调用时被执行。

Now a normal lock scenario would look like this in C#:现在,在 C# 中,正常的锁定场景如下所示:

static readonly object _locker = new object();

private static int counter;

public void Increase()
{
  lock (_locker)
  {
    _counter++;//Do more than this here
  }
}

As I understand it, the object _locker acts as a bool , which indicates if the method is currently being executed.据我了解, object _locker充当bool ,指示当前是否正在执行该方法。 If method is "free", set it to locked and execute method and free afterwards.如果方法是“免费的”,则将其设置为锁定并执行方法并随后释放。 If method is locked, wait until unlocked, lock, execute and unlock.如果方法被锁定,等待直到解锁,锁定,执行和解锁。

Sidequestion 1: Does calling this method repeatedly guarantee a queue like behavior?附带问题 1:重复调用此方法是否保证了类似队列的行为? Ignoring the fact that the blocking in the parent thread could cause problems.忽略父线程中的阻塞可能会导致问题的事实。 Imagine Increase() is the last call in the parent thread.想象一下, Increase()是父线程中的最后一次调用。

Sidequestion 2: Using an object in a boolean way feels odd.附带问题 2:object的方式使用 object 感觉很奇怪。 Does every object contain a "is-being-used" flag and a "raw" object is just being used for containing this flag?是否每个object都包含“正在使用”标志和“原始” object仅用于包含此标志? Why not Boolean ?为什么不Boolean

Sidequestion 3: how can lock() modify a readonly ?附带问题 3: lock()如何修改readonly

Functionally I could write it like this also:从功能上讲,我也可以这样写:

static Boolean _locker = false;

private static int counter;

public void Increase()
{
  while(_locker)//Waits for lock==0
  {
  }

  _locker = true;//Sets lock=1

  _counter++;//Do more than this here

  _locker = false;//Sets lock=0
}

While the lock example looks sophisticated and safe, the second one just feels wrong, and somehow rings alarm bells in my head.虽然锁定示例看起来复杂且安全,但第二个示例感觉不对,并且不知何故在我脑海中敲响了警钟。

Is it possible for this method to be executed at the exact same CPU cycle by two cores simultaneously?这种方法是否可以同时由两个内核以完全相同的 CPU 周期执行?

I know this is "but sometimes" taken to the extreme.我知道这是“但有时”被极端化了。 I believe an OS-scheduler does split threads from one application to multiple cores, so why shouldn't the assembly instruction "Load value of _locked for comparison" be executed on two cores at the same time?我相信操作系统调度程序确实将线程从一个应用程序拆分到多个内核,那么为什么不应该同时在两个内核上执行汇编指令“加载_locked的值以进行比较”? Even if the method is entered one cycle apart the "read for comparison" and "write true to _locked " would be executed at the same time.即使隔一个周期进入该方法,“读取以进行比较”和“将 true 写入_locked ”也会同时执行。

This doesn't even take into account that one line of C# could/will translate to multiple assembly instructions and a thread could be interrupted after confirming locked==0 and writing locked=1 .这甚至没有考虑到 C# 的一行可以/将转换为多个汇编指令,并且在确认locked==0并写入locked=1后线程可能会被中断。 Because one line of C# can result in many assembly instructions, even the lock() could be interrupted?因为一行C#会导致很多汇编指令,连lock()都可能被打断?

Obviously these problems are somehow solved or avoided, I would really appreciate a explanation of where my thought process is wrong or what I am missing.显然,这些问题以某种方式解决或避免,我真的很感激解释我的思维过程在哪里出错或我错过了什么。

TL;DR Can a lock() statement be executed at the exact same time by two CPU cores? TL;DR两个 CPU 内核可以同时执行lock()语句吗? I can't explain avoidance of this by software without big performance impacts.我无法解释软件在没有大的性能影响的情况下避免这种情况。

Yes, two cores can take two different locks at the same time.是的,两个核心可以同时获取两个不同的锁。 The atomic RMW operation only needs a "cache lock", not a global bus lock, on modern CPUs.在现代 CPU 上,原子 RMW 操作只需要一个“缓存锁”,而不是全局总线锁。 eg this test code ( on Godbolt ) is C++ code that compiles to a loop that just repeats an xchg [rdi], ecx , with each thread using a different std::atomic<int> object in a different cache line.例如,这个测试代码( 在 Godbolt 上)是 C++ 代码,它编译成一个循环,只重复一个xchg [rdi], ecx ,每个线程在不同的缓存行中使用不同的std::atomic<int> object。 The total runtime of the program on my i7-6700k is 463ms whether it runs on 1 or 4 threads, so that rules out any kind of system-wide bus lock, confirming that the CPU just uses a MESI cache-lock within the core doing the RMW to make sure it's atomic without disturbing operations of other cores.我的 i7-6700k 上的程序的总运行时间是 463 毫秒,无论它是在 1 个线程还是 4 个线程上运行,因此排除了任何类型的系统范围的总线锁定,确认 CPU 只是在内核中使用了 MESI 缓存锁定RMW以确保它是原子的,而不会干扰其他内核的操作。 Uncontended locks scale perfectly when each thread is only locking/unlocking its own lock repeatedly.当每个线程只重复锁定/解锁自己的锁时,非竞争锁可以完美扩展。

Taking a lock that was last released by another core will stall this one for maybe hundreds of clock cycles (40 to 70 nanoseconds is a typical inter-core latency) for the RFO (Read For Ownership) to complete and get exclusive ownership of the cache line, but won't have to retry or anything.获取最后由另一个内核释放的锁可能会使该锁延迟数百个时钟周期(40 到 70 纳秒是典型的内核间延迟),以便 RFO(读取所有权)完成并获得缓存的独占所有权行,但不必重试或其他任何事情。 Atomic RMW involves a memory barrier (on x86), so memory operations after the lock can't even get started, so the CPU core may be stalled for a while. Atomic RMW 涉及到一个 memory 屏障(在 x86 上),因此锁后的 memory 操作甚至无法启动,因此 CPU 内核可能会停滞一段时间。 There is significant cost here, compared to normal loads/stores, which out-of-order exec can't hide as well as some other things.与正常的加载/存储相比,这里有很大的成本,乱序执行程序无法隐藏以及其他一些事情。


No, two cores can't take the same lock at the same time 1 , that's the whole point of a mutex.不,两个核心不能同时使用同一个1 ,这就是互斥锁的全部意义所在。 Correctly-implemented ones don't have the same bug as your example of spin-wait and then separately store a true .正确实现的那些没有与您的 spin-wait 示例相同的错误,然后单独存储一个true

(Note 1: There are counted locks / semaphores that you can use to allow up to n threads into a critical section, for some fixed n , where the resource management problem you want to solve is something other than simple mutual exclusion. But you're only talking about mutexes.) (注 1:有计数锁/信号量,您可以使用最多允许n线程进入临界区,对于某些固定的n ,您要解决的资源管理问题不是简单的互斥。但是你'仅谈论互斥锁。)


The critical operation in taking a lock is an atomic RMW , for example x86 xchg [rcx], eax or lock cmpxchg [rcx], edx , that stores a 1 (true) and as part of the same operation checks what the old value was.获取锁的关键操作是原子 RMW ,例如 x86 xchg [rcx], eaxlock cmpxchg [rcx], edx ,它存储1 (true) 并作为同一操作的一部分检查旧值是什么. ( Can num++ be atomic for 'int num'? ). 对于“int num”,num++ 可以是原子的吗? )。 In C++, that would mean using std::atomic<bool> lock;在 C++ 中,这意味着使用std::atomic<bool> lock; / old = lock.exchange(true); / old = lock.exchange(true); In C#, you have Interlocked.Exchange() .在 C# 中,您有Interlocked.Exchange() That closes the race window your attempt contained, where two threads could exit the while(_locker){} loop and then both blindly store a _locker = true .这结束了比赛 window 您的尝试包含,其中两个线程可以退出while(_locker){}循环,然后都盲目地存储_locker = true

Also note that rolling your own spin-loop has problems if you don't use volatile or Volatile.Read() to stop the compiler from assuming that no other threads are writing a variable you're reading/writing.另请注意,如果您不使用volatileVolatile.Read()来阻止编译器假设没有其他线程正在写入您正在读取/写入的变量,那么滚动您自己的自旋循环会出现问题。 (Without volatile, while(foo){} can optimize into if(!foo) infinite_loop{} by hoisting the apparently loop-invariant load out of the loop). (如果没有 volatile, while(foo){}可以通过将明显的循环不变负载提升到循环外来优化为if(!foo) infinite_loop{} )。

(The other interesting part of implementing a lock is what to do if it's not available the first time you try. eg how long you keep spinning (and if so exactly how, eg the x86 pause instruction between read-only checks ), using CPU time while waiting, before falling back to making a system call to give up the CPU to another thread or process, and have the OS wake you back up when the lock is or might be available again. But that's all performance tuning; actually taking the lock revolves around an atomic RMW.) (实现锁的另一个有趣的部分是如果第一次尝试时它不可用该怎么办。例如,您保持旋转多长时间(如果是这样的话, 例如 x86 在只读检查之间pause指令),使用 CPU等待的时间,然后再回退到进行系统调用以将 CPU 放弃给另一个线程或进程,并让操作系统在锁再次可用或可能再次可用时唤醒您。但这都是性能调整;实际上采取锁围绕一个原子 RMW。)


Of course, if you're going to do any rolling-your-own, make the increment itself a lock-free atomic RMW with Interlocked.Increment(ref counter);当然,如果您要自己滚动,请使用Interlocked.Increment(ref counter); , as per the example in MS's docs ,根据MS 文档中的示例


Does every object contain a "is-being-used" flag and a "raw" object is just being used for containing this flag?是否每个 object 都包含“正在使用”标志和“原始” object 仅用于包含此标志? Why not Boolean?为什么不是 Boolean?

We know from object sizes that C# doesn't do that.我们从 object 尺寸知道 C# 不会这样做。 Probably you should just use lock (counter){ counter++; }可能你应该只使用lock (counter){ counter++; } lock (counter){ counter++; } instead of inventing a separate. lock (counter){ counter++; }而不是发明一个单独的。 Using a dummy object would make sense if you didn't have an existing object you wanted to manage, but instead some more abstract resource like calling into some function.如果您没有想要管理的现有 object,而是使用一些更抽象的资源,例如调用一些 function,则使用虚拟 object 是有意义的。 (Correct me if I'm wrong, I don't use C#; I'm just here for the cpu-architecture and assembly tags. Does lock() require an object, not a primitive type like int ?) (如果我错了,请纠正我,我不使用 C#;我只是为了 CPU 架构和程序集标签。lock lock()是否需要 object,而不是像int这样的原始类型?)

I'd guess that they instead do what normal C++ implementations of std::atomic<T> does for objects too large to be lock-free: a hash table of actual mutexes or spinlocks, indexed by C# object address . I'd guess that they instead do what normal C++ implementations of std::atomic<T> does for objects too large to be lock-free: a hash table of actual mutexes or spinlocks, indexed by C# object address . Where is the lock for a std::atomic? std::atomic 的锁在哪里?

Even if that guess isn't exactly what C# does, that's the kind of mental model that can make sense of this ability to lock anything without using reserving space in every object.即使这个猜测并不完全是 C# 所做的,这就是 model 的那种心理,它可以理解这种在不使用每个 object 中的保留空间的情况下锁定任何东西的能力。

This can create extra contention (by using the same mutex for two different objects).这可能会产生额外的争用(通过对两个不同的对象使用相同的互斥锁)。 It could even introduce deadlocks where there shouldn't have been any, which is something the implementation would have to work around.它甚至可能在不应该存在的地方引入死锁,这是实现必须解决的问题。 Perhaps by putting the identity of the object being locked into the mutex, so another thread that indexes the same mutex can see that it's actually being used to lock a different object, and then do something about it... This is perhaps where being a "managed" language comes in;也许通过将被锁定的 object 的身份放入互斥锁中,因此索引相同互斥锁的另一个线程可以看到它实际上被用于锁定不同的 object,然后对其进行处理......这可能是一个“托管”语言出现; Java apparently does the same thing where you can lock any object without having to define a separate lock. Java 显然做了同样的事情,您可以锁定任何 object 而无需定义单独的锁。

(C++ std::atomic doesn't have this problem because the mutexes are taken/released inside library functions, with no possibility to try to take two locks at the same time.) (C++ std::atomic 没有这个问题,因为互斥锁是在库函数中获取/释放的,不可能同时尝试获取两个锁。)


Do CPU cores tick at the same time? CPU内核是否同时滴答作响?

Not necessarily, eg Intel "server" chips (most Xeons) let each core control its frequency-multiplier independently.不一定,例如英特尔“服务器”芯片(大多数至强)让每个内核独立控制其倍频器。 However, even in a multi-socket system, the clock for all cores is normally still derived from the same source, so they can keep their TSC (which counts reference cycles, not core clocks) synced across cores.然而,即使在多插槽系统中,所有内核的时钟通常仍来自同一源,因此它们可以保持其 TSC(计算参考周期,而不是内核时钟)在内核之间同步。

Intel "client" chips, like desktop/laptop chips such as i7-6700, actually do use the same clock for all cores.英特尔“客户端”芯片,如 i7-6700 等台式机/笔记本电脑芯片,实际上确实为所有内核使用相同的时钟。 A core is either in low-power sleep (clock halted) or running at the same clock frequency as any other active cores.内核要么处于低功耗睡眠(时钟停止),要么以与任何其他活动内核相同的时钟频率运行。

None of this has anything to do with locking, or making atomic RMW operations truly atomic, and probably should be split off to a separate Q&A.这些都与锁定无关,或者使原子RMW操作真正成为原子,并且可能应该拆分为单独的 Q&A。 I'm sure there are plenty of non-x86 examples, but I happen to know how Intel CPUs do things.我确信有很多非 x86 示例,但我碰巧知道英特尔 CPU 是如何做事的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM