简体繁体 English

是否可以在Golang中使用CPU缓存？

[英]Is it possible to use CPU cache in Golang?

原文 2016-04-10 05:05:31 0 1 go

Consider some memory and CPU intensive task: 考虑一些内存和CPU密集型任务：
eg: Task Block: read 16 bytes from memory then do CPU job. 例如：任务块：从内存中读取16个字节，然后执行CPU作业。 Then write back to memory. 然后写回内存。
And this Task Block can be parallelizable meaning each core can ran one Task Block. 此任务块可以并行化，这意味着每个核心可以运行一个任务块。
eg: 8 CPU needs 8*16 byte cache but concurrently. 例如：8个CPU需要8 * 16字节缓存但同时进行。

1 个解决方案

Yes, and just like all other code running on your machine, they all use CPU cache. 是的，就像在您的机器上运行的所有其他代码一样，它们都使用CPU缓存。

It's much too broad of a question to tell you how to code your app to make it the most efficient use of cache. 要告诉您如何对应用进行编码以使其最有效地使用缓存，这是一个非常广泛的问题。 I highly recommend setting up Go Benchmarks and then refactor your code and compare times. 我强烈建议设置Go Benchmarks，然后重构代码并比较时间。 (Note, do not benchmark within a VM - VMs, and kind on any platform, do not have accurate enough clocks for Go's benchmarking. Run all Benchmarks native to your OS instead, no VM). （注意，不要在虚拟机中进行基准测试 - 虚拟机和任何平台上的类型，没有足够准确的时钟用于Go的基准测试。运行所有基准测试，而不是虚拟机）。

It all comes down to your ability to code the application to make efficient use of that CPU cache. 这一切都归结为您对应用程序进行编码以有效利用 CPU缓存的能力。 This is a much broader topic for how you use your variables, how often they get updated, what stays on the heap or gets GC on the stack and how often, etc. 这是一个更广泛的主题，关于如何使用变量，它们更新的频率，堆栈上的内容或堆栈中的GC以及频率等等。

One tiny example to point you in the right direction to read more about efficient L1 and L2 cache development... 一个很小的例子，指出你正确的方向，阅读更多关于有效的L1和L2缓存开发......

L1 cache uses 64 bit rows. L1缓存使用64位行。 If you want to store 4x 16bit Int16s, typically they will be allocated on the stack and most likely all stored on the same row of cache. 如果要存储4x 16位Int16，通常它们将在堆栈上分配，并且很可能全部存储在同一行缓存中。

Say you want to update one of the Int16s? 假设您要更新其中一个Int16？ Well, CPU cache cannot update part of the row: It will have to invalidate the entire row, and allocate a whole new row of cache with the previous 3 Int16s and your new updates value. 好吧，CPU缓存无法更新行的一部分：它必须使整行无效，并使用之前的3个Int16和新的更新值分配一整行新缓存。

Very inefficient. 非常低效。

One solution to that problem is use Int64s, which the CPU cache will only invalidate 1 row but yet keep the other 3 in cache for quick reads. 该问题的一个解决方案是使用Int64s，CPU缓存只会使1行无效，但仍保留其他3个缓存以便快速读取。 Are you doing more push or pops? 你在做更多的推动或流行音乐吗？ etc. 等等

Again, it highly depends on your use case: this may even slow things down if you are using a lot of context switching of those 4 ints (eg mutex locks). 同样，它在很大程度上取决于您的使用情况：如果您使用这4个整数的大量上下文切换（例如互斥锁），这甚至可能会减慢速度。 In which case that's a whole different problem to optimize. 在这种情况下，这是一个完全不同的优化问题。

I recommend reading up on high frequency scaling and memory allocations on the stack and heaps. 我建议阅读堆栈和堆上的高频缩放和内存分配。