简体   繁体   English

什么是LUT和类似的L1 / L2缓存行为?

[英]What is L1/L2 cache behavior for LUTs and the alike?

Assuming a LUT of say 512KB of 64-bit double types. 假设一个LUT说512KB的64位双类型。 Generally speaking, how does the CPU cache the structure in L1 or L2? 一般来说,CPU如何缓存L1或L2中的结构?

For example: I access the middle element, does it attempt to cache the whole LUT or just some of it - say the middle element and then n subsequent elements? 例如:我访问中间元素,是否尝试缓存整个LUT或只是其中的一部分 - 比如说中间元素然后是n个后续元素?

What kind of algorithms does the CPU use to determine what it keeps in L2 cache? CPU使用什么样的算法来确定它在L2缓存中保留的内容? Is there a certain look-ahead strategy it follows 它是否遵循一定的前瞻策略

Note: I'm assuming x86, but I'd be interested in knowing how other architectures works POWER, SPARC etc.. 注意:我假设是x86,但我有兴趣知道其他架构如何工作POWER,SPARC等。

It depends on the data structure you use for the LUT (look-up table?) 这取决于您用于LUT的数据结构(查找表?)

Caches are at their best with things that are laid out contiguously is memory (eg as arrays or std::vectors) rather than scattered around. 高速缓存是最好的,连续布局的东西是内存(例如作为数组或std :: vector)而不是散布在周围。

In simple terms, when you access a memory location, a block of RAM (a "cache line" worth -- 64 bytes on x86) is loaded into cache, possibly evicting some previously-cached data. 简单来说,当你访问一个内存位置时,一块RAM(一个“缓存行”,价值 - 在x86上为64字节)被加载到缓存中,可能会驱逐一些以前缓存的数据。

Generally, there are several levels of cache, forming a hierarchy. 通常,有几个级别的缓存,形成层次结构。 With each level, access times increase but so does capacity. 随着每个级别,访问时间增加,但容量也增加。

Yes, there is lookahead, which is limited by rather simplistic algorithms and the inability to cross page boundaries (a memory page is typically 4KB in size on x86.) 是的,有前瞻,它受到相当简单的算法和无法跨页边界的限制(在x86上内存页面的大小通常为4KB)。

I suggest that you read What Every Programmer Should Know About Memory . 我建议你阅读每个程序员应该了解的内存 It has lots of great info on the subject. 它有很多关于这个主题的好信息。

Caches are generally formed as a collection of cache lines. 高速缓存通常形成为高速缓存行的集合。 Each cache line's granularity is aligned to the size of the cache line, so, for example, a cache with a cache line of 128 bytes will have the address it is caching data for aligned to 128 bytes. 每个高速缓存行的粒度与高速缓存行的大小对齐,因此,例如,具有128字节高速缓存行的高速缓存将具有缓存数据的地址以对齐到128字节。

CPU caches generally use some LRU eviction mechanism (least recently used, as in evict the oldest cache line on a cache miss), as well as having some mapping from a memory address to a particular set of cache lines. CPU高速缓存通常使用一些LRU驱逐机制(最近最少使用,如驱逐高速缓存未命中的最老的高速缓存行),以及具有从存储器地址到特定高速缓存行集的一些映射。 (This results in one of the many false sharing errors in x86 if you are trying to read from multiple addresses aligned on a 4k or 16M boundary.) (如果您尝试从4k或16M边界上对齐的多个地址读取,则会导致x86中出现许多错误共享错误之一。)

So, when you have a cache miss, the CPU will read in a cache line of memory that includes the address range missed. 因此,当您有缓存未命中时,CPU将读取包含错过的地址范围的内存缓存行。 If you happen to read across a cache line boundary, that means you will read in two cache lines. 如果您碰巧读取了缓存行边界,这意味着您将读入两个缓存行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM