创建线程时设置 CPU 亲和性

Question

I want to create a C++11 thread which I want it to run on my first core.我想创建一个 C++11 线程，我希望它在我的第一个内核上运行。 I find that pthread_setaffinity_np and sched_setaffinity can change the CPU affinity of a thread and migrate it to the specified CPU.我发现pthread_setaffinity_np和sched_setaffinity可以改变一个线程的 CPU 亲和性并将其迁移到指定的 CPU。 However this affinity specification changes after the thread has run.但是，在线程运行后，此关联规范会发生变化。

How can I create a C++11 thread with specific CPU affinity (a cpu_set_t object)?如何创建具有特定 CPU 关联性的 C++11 线程（ cpu_set_t对象）？

If it is impossible to specify the affinity when initializing a C++11 thread, how can I do it with pthread_t in C?如果在初始化 C++11 线程时无法指定亲和性，我如何在 C 中使用pthread_t来实现？

My environment is G++ on Ubuntu.我的环境是 Ubuntu 上的 G++。 A piece of code is appreciated.一段代码表示赞赏。

Answer 1

I am sorry to be the "myth buster" here, but setting thread affinity has great importance, and it grows in importance over time as the systems we all use become more and more NUMA (Non-Uniform Memory Architecture) by nature.我很抱歉在这里成为“神话终结者”，但是设置线程关联非常重要，并且随着我们都使用的系统本质上变得越来越 NUMA（非统一内存架构），它的重要性会随着时间的推移而增加。 Even a trivial dual socket server these days has RAM connected separately to each socket, and the difference in access to memory from a socket to its own RAM to that of the neighboring processor socket (remote RAM) is substantial.如今，即使是微不足道的双插槽服务器也将 RAM 分别连接到每个插槽，并且从一个插槽到其自己的 RAM 的内存访问与相邻处理器插槽（远程 RAM）的内存访问之间的差异是巨大的。 In the near future, processors are hitting the market in which the internal set of cores is NUMA in itself (separate memory controllers for separate groups of cores, etc).在不久的将来，处理器将进入市场，其中内部内核集本身就是 NUMA（用于不同内核组的单独内存控制器等）。 There is no need for me to repeat the work of others here, just look for "NUMA and thread affinity" online - and you can learn from years of experience of other engineers.这里就不用我重复别人的工作了，网上找《NUMA与线程亲和性》——可以借鉴其他工程师多年的经验。

Not setting thread affinity is effectively equal to "hoping" that the OS scheduler will handle thread affinity correctly.不设置线程关联实际上等于“希望”操作系统调度程序将正确处理线程关联。 Let me explain: You have a system with some NUMA nodes (processing and memory domains).让我解释一下：您有一个带有一些 NUMA 节点（处理和内存域）的系统。 You start a thread, and the thread does some stuff with memory, eg malloc some memory and then process etc. Modern OS (at least Linux, others probably too) do a good job thus far, the memory is, by default, allocated (if available) from the same domain of the CPU where the thread is running.您启动一个线程，该线程使用内存执行一些操作，例如 malloc 一些内存然后处理等。现代操作系统（至少是 Linux，其他可能也是如此）到目前为止做得很好，默认情况下分配了内存（如果可用）来自运行线程的 CPU 的同一域。 Come time, the time-sharing OS (all modern OS) will put the thread to sleep.到时候，分时操作系统（所有现代操作系统）将使线程进入睡眠状态。 When the thread is put back into running state, it may be made runnable on any of the cores in the system (as you did not set an affinity mask to it), and the larger your system is, the higher the chance it will be "woken up" on a CPU which is remote from the memory it previously allocated or used.当线程重新进入运行状态时，它可能会在系统中的任何内核上运行（因为您没有为其设置关联掩码），并且您的系统越大，它的可能性就越大在远离它先前分配或使用的内存的 CPU 上“唤醒”。 Now, all your memory accesses would be remote (not sure what this means to your application performance? read more about remote memory access on NUMA systems online)现在，您的所有内存访问都将是远程的（不确定这对您的应用程序性能意味着什么？在线阅读有关 NUMA 系统上的远程内存访问的更多信息）

So, to summarize, affinity setting interfaces are VERY important when running code on systems that have more-than-trivial architecture -- which is rapidly becoming "any system" these days.因此，总而言之，在具有非常重要的体系结构的系统上运行代码时，关联设置接口非常重要——如今，这些系统正迅速成为“任何系统”。 Some thread runtime environments/libs allow for control of this at runtime without any specific programming (see OpenMP, for example in Intel's implementation of KMP_AFFINITY environment variable) - and it would be the right thing for C++11 implementers to include similar mechanisms in their runtime libs and language options (and until then, if your code is aimed for use on servers, I strongly recommend that you implement affinity control in your code)一些线程运行时环境/库允许在运行时控制这一点，而无需任何特定编程（参见 OpenMP，例如在英特尔对 KMP_AFFINITY 环境变量的实现中） - 对于 C++11 实现者来说，在其中包含类似的机制是正确的他们的运行时库和语言选项（在此之前，如果您的代码旨在用于服务器，我强烈建议您在代码中实现关联控制）

Answer 2

Yes, there are way to make it.是的，有办法做到。 I came across this method on this blog link我在这个博客链接上遇到了这个方法

I rewrite the code on the blog of Eli Bendersky, and the link was pasted above.我在Eli Bendersky的博客上重写了代码，上面贴了链接。 You can save the code below to test.cpp and compile & run it :您可以将下面的代码保存到 test.cpp 并编译并运行它：

 // g++ ./test.cpp  -lpthread && ./a.out
// 
#include <thread>
#include <vector>
#include <iostream>
#include <mutex>
#include <sched.h>
#include <pthread.h>
int main(int argc, const char** argv) {
  constexpr unsigned num_threads = 4;
  // A mutex ensures orderly access to std::cout from multiple threads.
  std::mutex iomutex;
  std::vector<std::thread> threads(num_threads);
  for (unsigned i = 0; i < num_threads; ++i) {
    threads[i] = std::thread([&iomutex, i,&threads] {
      // Create a cpu_set_t object representing a set of CPUs. Clear it and mark
      // only CPU i as set.
      cpu_set_t cpuset;
      CPU_ZERO(&cpuset);
      CPU_SET(i, &cpuset);
      int rc = pthread_setaffinity_np(threads[i].native_handle(),
                                      sizeof(cpu_set_t), &cpuset);
      if (rc != 0) {
        std::cerr << "Error calling pthread_setaffinity_np: " << rc << "\n";
      }
      std::this_thread::sleep_for(std::chrono::milliseconds(20));
      while (1) {
        {
          // Use a lexical scope and lock_guard to safely lock the mutex only
          // for the duration of std::cout usage.
          std::lock_guard<std::mutex> iolock(iomutex);
          std::cout << "Thread #" << i << ": on CPU " << sched_getcpu() << "\n";
        }

        // Simulate important work done by the tread by sleeping for a bit...
        std::this_thread::sleep_for(std::chrono::milliseconds(900));
      }
    });


  }

  for (auto& t : threads) {
    t.join();
  }
  return 0;
}

Answer 3

In C++ 11 you cannot set the thread affinity when the thread is created (unless the function that is being run in the thread does it on its own), but once the thread is created, you can set the affinity via whatever native interface you have by getting the native handle for the thread (thread.native_handle()), so for Linux you can get the pthread id via:在 C++ 11 中，您不能在创建线程时设置线程关联（除非线程中运行的函数自行完成），但是一旦创建线程，您就可以通过您拥有的任何本机接口设置关联通过获取线程的本机句柄 (thread.native_handle())，因此对于 Linux，您可以通过以下方式获取 pthread id：

pthread_t my_thread_native = my_thread.native_handle(); pthread_t my_thread_native = my_thread.native_handle();

Then you can use any of the pthread calls passing in my_thread_native where it wants the pthread thread id.然后您可以使用任何在 my_thread_native 中传递的 pthread 调用，它需要 pthread 线程 ID。

Note that most thread facilities are implementation specific, ie pthreads, windows threads, native threads for other OSes all have their own interface and types this portion of your code would not be very portable.请注意，大多数线程设施都是特定于实现的，即 pthreads、windows 线程、其他操作系统的本机线程都有自己的接口和类型，这部分代码不会很便携。

Answer 4

After searching for a while, it seems that we cannot set CPU affinity when we create a C++ thread .找了一会，好像我们在创建C++ thread时候不能设置CPU的亲和性。

The reason is that, there is NO NEED to specify the affinity when create a thread.原因是，创建线程时无需指定关联。 So, why bother make it possible in the language.那么，为什么要在语言中实现它。

Say, we want the workload f() to be bound to CPU0.比如说，我们希望工作负载f()绑定到 CPU0。 We can just change the affinity to CPU0 right before the real workload by calling pthread_setaffinity_np .我们可以通过调用pthread_setaffinity_np在实际工作负载之前更改与 CPU0 的亲和性。

However, we CAN specify the affinity when create a thread in C. (thanks to the comment from Tony D).然而，当创建C.线程（感谢托尼d评论），我们可以指定亲和力。 For example, the following code outputs "Hello pthread".例如，以下代码输出“Hello pthread”。

void *f(void *p) {
  std::cout<<"Hello pthread"<<std::endl;
}

cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(0, &cpuset);
pthread_attr_t pta;
pthread_attr_init(&pta);
pthread_attr_setaffinity_np(&pta, sizeof(cpuset), &cpuset);
pthread_t thread;
if (pthread_create(&thread, &pta, f, NULL) != 0) {
    std::cerr << "Error in creating thread" << std::endl;
}
pthread_join(thread, NULL);
pthread_attr_destroy(&pta);

创建线程时设置 CPU 亲和性

问题描述

4 个解决方案

解决方案1
33 2014-11-17 14:40:05

解决方案2
15 2019-08-23 05:56:32

解决方案3
2 2016-02-08 20:14:56

解决方案4
-8 2014-07-09 05:46:03

创建线程时设置 CPU 亲和性

问题描述

4 个解决方案

解决方案1 33 2014-11-17 14:40:05

解决方案2 15 2019-08-23 05:56:32

解决方案3 2 2016-02-08 20:14:56

解决方案4 -8 2014-07-09 05:46:03

解决方案1
33 2014-11-17 14:40:05

解决方案2
15 2019-08-23 05:56:32

解决方案3
2 2016-02-08 20:14:56

解决方案4
-8 2014-07-09 05:46:03