简体   繁体   English

非均匀内存访问次数下的线程调度

[英]Thread scheduling under non-uniform memory access times

The specifics are obviously OS dependent, but I'm looking for algorithms that are used to assign threads to physical cores for non-uniform memory access architectures (ie accessing different addresses takes different amounts of time. This could be, for instance, because the cache has been divided into physically distributed slices, each placed at a different location and therefore, each has a different access time based on the distance from the core).具体细节显然取决于操作系统,但我正在寻找用于将线程分配给物理内核以实现非统一内存访问架构的算法(即访问不同的地址需要不同的时间。例如,这可能是因为缓存已被划分为物理分布的切片,每个切片放置在不同的位置,因此,每个切片根据与核心的距离具有不同的访问时间)。

Obviously, the scheduler also takes into account things like the number of threads already assigned to the processor among many other variables, but I'm specifically looking for scheduling algorithms that primarily try to minimize memory access time in NUMA architectures.显然,调度程序还考虑了诸如已经​​分配给处理器的线程数等许多其他变量,但我特别在寻找主要尝试最小化 NUMA 架构中的内存访问时间的调度算法。

I can't say I am an expert on the topic - I am not - but so far no one else seems eager to answer, so I will give it my best shot.我不能说我是该主题的专家 - 我不是 - 但到目前为止似乎没有其他人急于回答,所以我会尽我所能。

It would make sense to assume that, on a NUMA system, it would be beneficial to keep running a thread on the same core as long as possible.假设在 NUMA 系统上尽可能长时间地在同一内核上运行线程将是有益的。 This would essentially mean a weak form of processor affinity , where the scheduler decides on which core a thread should be run and may change it dynamically.这实质上意味着一种弱形式的处理器关联,其中调度程序决定线程应该运行哪个核心,并且可能会动态更改它。

Basic scheduling with processor affinity is easy enough to implement: you just take an existing scheduling algorithm and modify it in such a way that each core has its own thread queue (or queues).具有处理器亲和性的基本调度很容易实现:您只需采用现有的调度算法并以这样一种方式修改它,即每个内核都有自己的线程队列(或多个队列)。 On a NUMA system, the rest is a matter of determining when it is beneficial to migrate a thread onto another core;在 NUMA 系统上,剩下的就是确定何时将线程迁移到另一个核心上是有益的; I don't think it is possible to give a generally applicable algorithm for that, because the benefits and costs are highly dependent on the specifics of the system in question.我认为不可能为此给出一个普遍适用的算法,因为收益和成本高度依赖于相关系统的细节。

Note that the kind of processor affinity the scheduler would need is weak and automatic : to which core a thread is pinned is entirely up to the scheduler and may change whenever the scheduler considers it beneficial.请注意,调度程序需要的处理器关联类型是弱的自动的:线程被固定到哪个核心完全取决于调度程序,并且在调度程序认为有益时可能会更改。 This is in sharp contrast to processor affinity in, for example, the Linux scheduler, where processor affinity is hard (a thread cannot be run on a core it doesn't have affinity with) and manually managed by the user (see sched_setaffinity and pthread_setaffinity_np ).这与处理器亲和性形成鲜明对比,例如,Linux 调度程序中的处理器亲和性很难(线程不能在与它没有亲缘关系的核心上运行)并且由用户手动管理(请参阅sched_setaffinitypthread_setaffinity_np )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM