简体   繁体   English

如何区分 C++ 中的高性能和低性能内核/线程?

[英]How can I distinguish between high- and low-performance cores/threads in C++?

When talking about multi-threading, it often seems like threads are treated as equal - just the same like the main thread, but running next to it.在谈论多线程时,线程似乎通常被视为平等——就像主线程一样,但在它旁边运行。

On some new processors however, such as the Apple M1 chip and the upcoming Intel Alder Lake series not all threads are equally as performant as these chips feature separate high-performance cores and high-efficiency, slower cores.但是,在某些新处理器上,例如Apple M1芯片和即将推出的 Intel Alder Lake系列,并非所有线程的性能都与这些芯片的性能相同,因为这些芯片具有独立的高性能内核和高效、速度较慢的内核。

It's not to say that there weren't already things such as hyper-threading, but this seems to have a much larger performance implication.这并不是说还没有诸如超线程之类的东西,但这似乎对性能有更大的影响。

Is there a way to query std::thread 's properties and enforce on which cores they'll run in C++?有没有办法查询std::thread的属性并强制它们在 C++ 中运行的核心?

How to distinguish between high- and low-performance cores/threads in C++?如何区分 C++ 中的高性能和低性能内核/线程?

Please understand that "thread" is an abstraction of the hardware's capabilities and that something beyond your control (the OS, the kernel's scheduler) is responsible for creating and managing this abstraction.请理解,“线程”是硬件功能的抽象,并且超出您控制范围的东西(操作系统、内核的调度程序)负责创建和管理此抽象。 "Importance" and performance hints are part of that abstraction (typically presented in the form of a thread priority). “重要性”和性能提示是该抽象的一部分(通常以线程优先级的形式呈现)。

Any attempt to break the "thread" abstraction (eg determine if the core is a low-performance or high-performance core) is misguided.任何打破“线程”抽象的尝试(例如确定内核是低性能还是高性能内核)都是错误的。 Eg OS could change your thread to a low performance core immediately after you find out that you were running on a high performance core, leading you to assume that you're on a high performance core when you are not.例如,在您发现自己在高性能内核上运行后,操作系统可能会立即将您的线程更改为低性能内核,从而导致您假设自己运行在高性能内核上,而实际上并非如此。

Even pinning your thread to a specific core (in the hope that it'll always be using a high-performance core) can/will backfire (cause you to get less work done because you've prevented yourself from using a "faster than nothing" low-performance core when high-performance core/s are busy doing other work).即使将您的线程固定到特定的内核(希望它始终使用高性能内核)也可能/会适得其反(导致您完成更少的工作,因为您已经阻止自己使用“比没有更快的“当高性能核心忙于做其他工作时,低性能核心)。

The biggest problem is that C++ creates a worse abstraction ( std::thread ) on top of the "likely better" abstraction provided by the OS.最大的问题是 C++ 在操作系统提供的“可能更好”的抽象之上创建了一个更糟糕的抽象( std::thread )。 Specifically, there's no way to set, modify or obtain the thread priority using std::thread ;具体来说,无法使用std::thread设置、修改或获取线程优先级; so you're left without any control over the "performance hints" that are necessary (for the OS, scheduler) to make good "load vs. performance vs. power management" decisions.因此,您无法控制(对于操作系统、调度程序)做出良好“负载与性能与电源管理”决策所必需的“性能提示”。

When talking about multi-threading, it often seems like threads are treated as equal在谈论多线程时,通常似乎线程被视为平等

Often people think we're still using time-sharing systems from the 1960s.人们通常认为我们仍在使用 1960 年代的分时系统。 Stop listening to these fools.别再听这些傻子了。 Modern systems do not allow CPU time to be wasted on unimportant work while more important work waits.现代系统不允许将 CPU 时间浪费在不重要的工作上,而更重要的工作在等待。 Effective use of thread priorities is a fundamental performance requirement.有效使用线程优先级是一项基本的性能要求。 Everything else ("load vs. performance vs. power management" decisions) is, by necessity, beyond your control (on the other side of the "thread" abstraction you're using).其他一切(“负载与性能与电源管理”的决定)必然超出您的控制(在您使用的“线程”抽象的另一侧)。

Is there any way to query std::thread's properties and enforce on which cores they'll run in C++?有什么方法可以查询 std::thread 的属性并强制它们在 C++ 中运行的核心?

No. There is no standard API for this in C++.不。在 C++ 中没有用于此的标准 API。

Platform specific API's do have the ability to specify specific logical core (or a set of such cores) for a software thread.特定于平台的 API 确实能够为软件线程指定特定的逻辑核心(或一组此类核心)。 For example, GNU has pthread_setaffinity_np .例如,GNU 有pthread_setaffinity_np

Note that this allows you to specify "core 1" for your thread, but that doesn't necessarily help with getting the "performance" core unless you know which core that is.请注意,这允许您为线程指定“核心 1”,但这不一定有助于获得“性能”核心,除非您知道这是哪个核心。 To figure that out, you may need to go below OS level and into CPU specific assembly programming.要弄清楚这一点,您可能需要低于操作系统级别并进入 CPU 特定的汇编编程。 In case of Intel to my understanding, you would use the Enchanced Hardware Feedback Interface.据我了解,如果是英特尔,您将使用增强型硬件反馈接口。

No, the C++ standard library has no direct way to query the sub-type of CPU, or state you want a thread to run on a specific CPU.不,C++ 标准库没有直接的方法来查询 CPU 的子类型,或声明您希望线程在特定 CPU 上运行。

But std::thread (and jthread ) does have .native_handle() , which on most platforms will let you do this.但是std::thread (和jthread )确实有.native_handle() ,在大多数平台上都可以让你这样做。

If you know the threading library implementation of your std::thread , you can use native_handle() to get at the underlying primitives, then use the underlying threading library to do this kind of low-level work.如果您知道std::thread的线程库实现,则可以使用native_handle()获取底层原语,然后使用底层线程库来执行此类低级工作。

This will be completely non-portable, of course.当然,这将是完全不可移植的。

iPhones, iPads and newer Macs have high- and low-performance cores for a reason. iPhone、iPad 和较新的 Mac 具有高性能和低性能内核是有原因的。 The low performance cores allow some reasonable amount of work to be done while using the smallest possible amount of energy, making the battery of the device last longer.低性能内核允许在使用尽可能少的能量的同时完成一些合理的工作,从而使设备的电池持续更长时间。 These additional cores are not there just for fun;这些额外的内核不仅仅是为了好玩; if you try to get around them, you can end up with a much worse experience for the user.如果你试图绕过它们,最终会给用户带来更糟糕的体验。

If you use the C++ standard library for running multiple threads, the operating system will detect what you are doing, and act accordingly.如果您使用 C++ 标准库来运行多个线程,操作系统将检测您在做什么,并采取相应的行动。 If your task only takes 10ms on a high-performance core, it will be moved to a low performance core;如果你的任务在高性能核心上只需要 10ms,它就会被转移到一个低性能核心上; it's fast enough and saves battery life.它足够快,可以节省电池寿命。 If you have multiple threads using 100% of the CPU time, the high-performance cores will be used automatically (plus the low-performance cores as well).如果您有多个线程使用 100% 的 CPU 时间,则将自动使用高性能内核(以及低性能内核)。 If your battery runs low, the device can switch to all low-performance cores which will get more work done with the battery charge you have .如果您的电池电量不足,设备可以切换到所有低性能核心,这将使用您拥有的电池电量完成更多工作。

You should really think about what you want to do.你真的应该考虑一下你想做什么。 You should put the needs of the user ahead of your perceived needs.你应该把用户的需求放在你感知的需求之前。 Apart from that, Apple recommends to assign OS-specific priorities to your threads, which improves behaviour if you do it right.除此之外,Apple 建议为您的线程分配特定于操作系统的优先级,如果您做得对,这会改善行为。 Giving a thread highest priority so you can get better benchmark results is usually not "doing it right".为线程赋予最高优先级以便获得更好的基准测试结果通常不是“做对了”。

You can't select the core that a thread will be physically scheduled to run on using std::thread .您不能使用std::thread选择线程将在物理上运行的核心。 See here for more.请参阅此处了解更多信息。 I'd suggest using a framework like OpenMP , MPI , or you will have dig into the native Mac OS APIs to select the core for your thread to execute on.我建议使用OpenMPMPI 之类的框架,否则您将深入研究本机 Mac OS API 来选择要在其上执行线程的核心。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM