简体   繁体   English

缓存一致性有什么意义?

[英]What's the point of cache coherency?

On CPUs like x86, which provide cache coherency, how is this useful from a practical perspective? 在像x86这样提供缓存一致性的CPU上,从实际角度来看这有什么用呢? I understand that the idea is to make memory updates done on one core immediately visible on all other cores. 我知道这个想法是在一个核心上完成内存更新,在所有其他核心上立即可见。 This is a useful property. 这是一个有用的属性。 However, one can't rely too heavily on it if not writing in assembly language, because the compiler can store variable assignments in registers and never write them to memory. 但是,如果不用汇编语言编写,就不能过分依赖它,因为编译器可以在寄存器中存储变量赋值,而不会将它们写入内存。 This means that one must still take explicit steps to make sure that stuff done in other threads is visible in the current thread. 这意味着必须采取明确的步骤,以确保在当前线程中可以看到在其他线程中完成的内容。 Therefore, from a practical perspective, what has cache coherency achieved? 因此,从实际角度来看,缓存一致性实现了什么?

The short story is, non-cache coherent system are exceptionally difficult to program especially if you want to maintain efficiency - which is also the main reason even most NUMA systems today are cache-coherent. 简而言之,非缓存一致系统特别难以编程,特别是如果你想保持效率 - 这也是今天大多数NUMA系统都是缓存一致的主要原因。

If the caches wern't coherent, the "explicit steps" would have to enforce the coherency - explicit steps are usually things like critical sections/mutexes(eg volatile in C/C++ is rarly enough) . 如果缓存不一致,那么“显式步骤”必须强制执行一致性 - 显式步骤通常是关键部分/互斥体(例如,C / C ++中的volatile是非常的)。 It's quite hard, if not impossible for services such as mutexes to keep track of only the memory that have changes and needs to be updated in all the caches -it would probably have to update all the memory, and that is if it could even track which cores have what pieces of that memory in their caches. 对于诸如互斥体之类的服务来说,如果不是不可能只跟踪那些已经发生变化且需要在所有缓存中更新的内存,那将非常困难 - 这可能需要更新所有内存,即使它甚至可以跟踪哪些核心在其缓存中具有哪些内存。

Presumable the hardware can do a much better and efficient job at tracking the memory addresses/ranges that have been changed, and keep them in sync. 可以预测硬件可以在跟踪已更改的内存地址/范围方面做得更好,更有效,并使它们保持同步。

And, imagine a process running on core 1 and gets preempted. 并且,假设一个进程在核心1上运行并被抢占。 When it gets scheduled again, it got scheduled on core 2. 当它再次被安排时,它被安排在核心2上。

This would be pretty fatal if the caches weren't choerent as otherwise there might be remnants of the process data in the cache of core 1, which doesn't exist in core 2's cache. 如果高速缓存不是非常重要的话,这将是非常致命的,否则核心1的高速缓存中可能存在过程数据的残余,这在核心2的高速缓存中不存在。 Though, for systems working that way, the OS would have to enforce the cache coherency as threads are scheduled - which would probably be an "update all the memory in caches between all the cores" operation, or perhaps it could track dirty pages vith the help of the MMU and only sync the memory pages that have been changed - again, the hardware likely keep the caches coherent in a more finegrainded and effcient way. 但是,对于以这种方式工作的系统,操作系统必须在调度线程时强制执行缓存一致性 - 这可能是“更新所有内核之间的缓存中的所有内存”操作,或者它可能跟踪脏页面MMU的帮助,只同步已更改的内存页面 - 再次,硬件可能使缓存以更精细和有效的方式保持连贯。

There are some nuances not covered by the great responses from the other authors. 其他作者的回应并未涵盖一些细微差别。

First off, consider that a CPU doesn't deal with memory byte-by-byte, but with cache lines. 首先,考虑一个CPU不是逐字节处理内存,而是使用缓存行。 A line might have 64 bytes. 一行可能有64个字节。 Now, if I allocate a 2 byte piece of memory at location P, and another CPU allocates an 8 byte piece of memory at location P + 8, and both P and P + 8 live on the same cache line, observe that without cache coherence the two CPUs can't concurrently update P and P + 8 without clobbering each others changes! 现在,如果我在位置P分配一个2字节的内存,而另一个CPU在位置P + 8分配一个8字节的内存,并且P和P + 8都存在于同一个缓存线上,请观察没有缓存一致性两个CPU不能同时更新P和P + 8而不会破坏彼此的变化! Because each CPU does read-modify-write on the cache line, they might both write out a copy of the line that doesn't include the other CPU's changes! 因为每个CPU都在高速缓存行上进行读 - 修改 - 写,所以它们都可能写出不包含其他CPU更改的行的副本! The last writer would win, and one of your modifications to memory would have "disappeared"! 最后一位作家将获胜,你对记忆的一次修改会“消失”!

The other thing to bear in mind is the distinction between coherency and consistency. 另一件要记住的是一致性和一致性之间的区别。 Because even x86 derived CPUs use store buffers, there aren't the guarantees you might expect that instructions that have already finished have modified memory in such a way that other CPUs can see those modifications, even if the compiler has decided to write the value back to memory (maybe because of volatile ?). 因为即使x86派生的CPU使用存储缓冲区,也没有保证您可能期望已经完成的指令修改了内存,使得其他CPU可以看到这些修改,即使编译器已决定将值写回记忆(也许是因为volatile ?)。 Instead the mods may be sitting around in store buffers. 相反,mod可能位于商店缓冲区中。 Pretty much all CPUs in general use are cache coherent, but very few CPUs have a consistency model that is as forgiving as the x86's. 通常使用的所有CPU都是缓存一致的,但很少有CPU具有与x86一样宽容的一致性模型。 Check out, for example, http://www.cs.nmsu.edu/~pfeiffer/classes/573/notes/consistency.html for more information on this topic. 有关此主题的更多信息, 查看http://www.cs.nmsu.edu/~pfeiffer/classes/573/notes/consistency.html

Hope this helps, and BTW, I work at Corensic, a company that's building a concurrency debugger that you may want to check out. 希望这有帮助,顺便说一句,我在Corensic工作,这是一家正在构建并发调试器的公司,您可能想要查看。 It helps pick up the pieces when assumptions about concurrency, coherence, and consistency prove unfounded :) 当关于并发性,连贯性和一致性的假设证明是没有根据的时候,它有助于拾取碎片:)

Imagine you do this: 想象一下你这样做:

lock(); //some synchronization primitive e.g. a semaphore/mutex
globalint = somevalue;
unlock();

If there were no cache coherence, that last unlock() would have to assure that globalint are now visible everywhere, with cache coherance all you need to do is to write it to memory and let the hardware do the magic. 如果没有缓存一致性,那最后一次unlock()必须确保globalint现在随处可见,缓存一致性所需要做的就是将其写入内存并让硬件发挥作用。 A software solution would have keep tack of which memory exists in which caches, on which cores, and somehow make sure they're atomically in sync. 一个软件解决方案可以保持哪些内存存在于哪个内存中,哪些内核,以及以某种方式确保它们在原子上同步。

You'd win an award if you can find a software solution that keeps track of all the pieces of memory that exist in the caches that needs to be keept in sync, that's more efficient than a current hardware solution. 如果您能找到一个软件解决方案来跟踪需要保持同步的高速缓存中存在的所有内存,那么您将赢得奖励,这比当前的硬件解决方案更有效。

Cache coherency becomes extremely important when you are dealing with multiple threads and are accessing the same variable from multiple threads. 当您处理多个线程并从多个线程访问同一个变量时,缓存一致性变得极为重要。 In that particular case, you have to ensure that all processors/cores do see the same value if they access the variable at the same time, otherwise you'll have wonderfully non-deterministic behaviour. 在这种特殊情况下,您必须确保所有处理器/核心在同时访问变量时确实看到相同的值,否则您将具有奇妙的非确定性行为。

It's not needed for locking. 锁定不需要它。 The locking code would include cache flushing if that was needed. 如果需要,锁定代码将包括缓存刷新。 It's mainly needed to ensure that concurrent updates by different processors to different variables in the same cache line aren't lost. 主要需要确保不同处理器对同一高速缓存行中的不同变量的并发更新不会丢失。

Cache coherency is implemented in hardware because the programmer doesn't have to worry about making sure all threads see the latest value of a memory location while operating in multicore/multiprocessor enviroment. 高速缓存一致性是在硬件中实现的,因为程序员不必担心在多核/多处理器环境中操作时确保所有线程都能看到内存位置的最新值。 Cache coherence gives an abstraction that all cores/processors are operating on a single unified cache, though every core/processor has it own individual cache. 高速缓存一致性给出了一个抽象,即所有核心/处理器都在单个统一高速缓存上运行,尽管每个核心/处理器都有自己的独立高速缓存。

It also makes sure the legacy multi-threaded code works as is on new processors models/multi processor systems, without making any code changes to ensure data consistency. 它还确保传统的多线程代码在新的处理器型号/多处理器系统上工作,而无需进行任何代码更改以确保数据一致性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM