简体   繁体   English

当我有每个CPU的数据结构时,是否可以提高性能以将它们放在不同的页面上?

[英]When I have per-CPU data structures, does it improve performance to have them on different pages?

I have a small struct of per-CPU data in a linux kernel module, where each CPU frequently writes and reads its own data. 我在Linux内核模块中有一个小的每CPU数据结构,每个CPU经常写入和读取自己的数据。 I know that I need to make sure these items of data aren't on the same cache line, because if they were then the cores would be forever dirtying each other's caches. 我知道我需要确保这些数据项不在同一个缓存行中,因为如果它们是核心将永远弄脏对方的缓存。 However, is there anything at the page level that I need to worry about from an SMP performance point of view? 但是,从SMP性能的角度来看,我需要担心页面级别的任何内容吗? ie. 即。 would there be any performance impact from padding these per-cpu structures out to 4096 bytes and aligning them? 将这些per-cpu结构填充到4096字节并对齐它们会产生任何性能影响吗?

This is on linux 2.6 on x86_64. 这是在x86_64上的linux 2.6上。

(Points about whether it's worth optimising and suggestions that I go benchmark it aren't needed -- what I'm looking for is whether there's any theoretical basis for worrying about page alignment). (关于是否值得优化以及建议我进行基准测试是不需要的 - 我正在寻找的是是否有任何理论基础来担心页面对齐)。

Within a single NUMA node, different pages are only helpful if you want to apply different permissions, or map them individually into processes. 在单个NUMA节点中,只有在要应用不同权限或将它们单独映射到进程时,不同页面才有用。 For performance issues, being on different cachelines is sufficient. 对于性能问题,在不同的高速缓存行上就足够了。

On NUMA architectures, you may want to place a CPU's per-CPU structure on a page that is local to that CPU's node - but you still wouldn't pad the structure out to a page size to achieve that, because you can place the structures for multiple CPUs within the same NUMA node on the same page. 在NUMA体系结构中,您可能希望将CPU的每CPU结构放置在该CPU节点本地的页面上 - 但是您仍然不会将结构填充到页面大小以实现该目标,因为您可以放置​​结构对于同一页面上同一NUMA节点内的多个CPU。

Even on a NUMA system, you probably won't benefit much by allocating memory pages local to each cpu (use kmalloc_node() , if you're curious). 即使在NUMA系统上,通过为每个cpu分配本地的内存页面,你可能也不会受益很多kmalloc_node()如果你很好奇,请使用kmalloc_node() )。

Node-local memory will be faster, but only in the case where it misses at all cache levels. 节点本地内存将更快,但仅限于它在所有高速缓存级别都未命中的情况。 For anything used with any frequency, you probably won't be able to tell the difference. 对于任何频率使用的任何东西,你可能无法区分它们。 If you're allocating megabytes of cpu-local data, then it probably makes sense to allocate pages local to each cpu. 如果你要分配兆字节的cpu-local数据,那么为每个cpu分配本地页面可能是有意义的。

Well, I've read a fair bit about linux having NUMA support these days. 好吧,我现在已经阅读了一些关于Linux的 NUMA支持。 In a NUMA setup, it would be helpful if the data for each CPU was located on a page that is local to that CPU. 在NUMA设置中,如果每个CPU的数据位于该CPU本地的页面上,将会很有帮助。

percpu generally makes sure that they don't share a cache line. percpu通常会确保它们不共享缓存行。 Otherwise commits like 7489aec8eed4f2f1eb3b4d35763bd3ea30b32ef5 would have been pretty useless. 否则像7489aec8eed4f2f1eb3b4d35763bd3ea30b32ef5这样的提交本来就没用了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何获得像“top”命令一样的per-cpu统计信息(system,idle,nice,...)? - How do I get per-cpu stats (system, idle, nice, …) like the “top” command does? 如何将 __attribute__((section(“name”))) 用于 per-cpu 全局变量 - How to use __attribute__((section(“name”))) for a per-cpu global variable 我有两个char数组,当它们的大小不同时,程序无法正确读取/写入它们 - I have two char arrays and when their sizes are different the program doesn't read/write them properly 为什么数据结构的大小通常为2 ^ n? - Why is that data structures usually have a size of 2^n? 在 C 中,具有不同结构的 2 个结构之间的“传输/转换”如何工作? - How does "transfer/casting" between 2 structs that have different structures works in C? 结构-声明说明符中的两个或多个数据类型(我有分号) - Structures - Two or more data types in declaration specifiers (I have semi-colons) 相同的功能有不同的性能,为什么? - Identical functions have different performance, why? 结构的指针符号是否具有特定的访问元素样式? - Does pointer notation of structures have specific style of accessing elements? 为什么你必须把malloc大数据结构放在堆上? - why do you have to malloc large data structures on the heap? 包含结构的代码有问题,但我不知道为什么(C) - Problem with code including structures but I have no idea why (C)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM