简体   繁体   中英

How many threads should I create?

Based on this question, I have a class, where its constructor does only some assignments and then there is a build() member function which actually does the job.

I know that the number of objects I will have to build is in the range of [2, 16]. The actual number is a user parameter.

I create my objects in a for loop like this

for (int i = 0; i < n; ++i) {
  roots.push_back(RKD<DivisionSpace>(...));
}

and then in another for loop I create the threads. Every thread calls build() in a chunk of objects, based on this logic:

If your vector has n elements and you have p threads, thread i writes only to elements

[in / p, (i + 1) n / p).

So for example, the situation is like this:

std::vector<RKD<Foo>> foos;
// here is a for loop that pushes back 'n' objects to foos

// thread A         // thread B                 // thread C
foos[0].build();    foos[n / 3 + 0].build();    foos[2 * n / 3 + 0].build();
foos[1].build();    foos[n / 3 + 1].build();    foos[2 * n / 3 + 1].build();
foos[2].build();    foos[n / 3 + 2].build();    foos[2 * n / 3 + 2].build();
...                 ...                         ...

The approach I followed was to determine the number of threads p like this:

p = min(n, P) 

where n is the number of objects I want to create and P the return value of std::thread::hardware_concurrency . After dealing with some issues that C++11 feature has, I read this:

Even when hardware_concurrency is implemented, it cannot be relied as a direct mapping to the number of cores. This is what the standard says it returns - The number of hardware thread contexts. And goes on to state - This value should only be considered to be a hint If your machine has hyperthreading enabled, it's entirely possible the value returned will be 2x the number of cores. If you want a reliable answer, you'll need to use whatever facilities your OS provides. – Praetorian

That means that I should probably change approach, since this code is meant to be executed from several users (and I mean not only in my system, many people are going to run that code). So, I would like to choose the number of threads in a way that will be both standard and efficient. Since the number of objects is relatively small, is there some rule to follow or something?

Just pick a thread pool of hardware_concurrency threads and queue the items on a first come, first served basis.

If other processes in the system somehow get priority from the OS, so be it. This simply means that fewer than the allocated pool size (eg P - 1 ) can run simultaneously. It doesn't matter since the first available pool thread that is done build() -ing one item will pick the next item from the queue.

To really avoid threads competing over the same core, you could

  • use a semaphore (interprocess semaphore if you want to actually coordinate the builder threads from separate processes)

  • thread affinity (to prevent the OS from scheduling a particular thread onto a different core the next time slice); sadly I don't think there is standard , platform-independent, way to set thread affinity (yet).

I see no compelling reason to make it more complicated

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM