C++ Pthreads: Algorithm to run N threads simultaneously when >N threads must be run each iteration

Question

I have a program that needs to run a function M times per iteration, and those runs can be parallelized. Lets say I'm limited to running N threads at a time (say by the number of cores available). I need an algorithm that will make sure I'm always running N threads (so long as the number of threads remaining is >= N) and that algorithm needs to be invariant to the completion order of those threads. Also, the thread scheduling algorithm should not claim significant CPU time.

I have something like the following in mind, but its clearly flawed.

#include <iostream>
#include <pthread.h>
#include <cstdlib>

void *find_num(void* arg)
{
    double num = rand();
    for(double q=0; 1; q++)
        if(num == q)
        {
            std::cout << "\n--";
            return 0;
        }
}


int main ()
{
    srand(0);

    const int N = 2;
    pthread_t threads [N];
    for(int q=0; q<N; q++)
        pthread_create(&threads [q], NULL, find_num, NULL);

    int M = 30;
    int launched=N;
    int finnished=0;
    while(1)
    {
        for(int w=0; w<N; w++)
        {
            //inefficient if `threads [1]` done before `threads [2]`
            pthread_join( threads [w], NULL);
            finnished++;
            std::cout << "\n" << finnished;
            if(finnished == M)
                break;
            if(launched < M)
            {
                pthread_create(&threads [w], NULL, find_num, NULL);
                launched++;
            }
        }

        if(finnished == M)
            break;
    }
}

The obvious problem here is that if threads[1] finishes before threads[0] there is wasted CPU time, and I can't think of how to get around that. Also, I'm assuming that having the main routine waiting on pthread_join() is not a significant drain on CPU time?

Answer 1

I would advice against respawining threads, it's a rather serious overhead. Instead, create a pool of N threads and submit work to them via a work-queue, a rather standard approach. Even if your remaining work is less than N, the extra threads will not do any harm, they'll just stay there blocked in the work-queue.

If you insist on your current approach you can do like this:

Do not wait for threads with pthread_join , you don't need it, since you're not communicating anything back to the main thread. Create the threads with the attribute PTHREAD_CREATE_DETACHED and just let them exit.

In the main thread, wait on a semaphore, which is signaled by each exiting thread - in effect you would wait for any thread termination. If you don't have <semaphore.h> for any reason, it's trivial to implement it with mutexes and conditions.

#include <semaphore.h>
#include <iostream>
#include <pthread.h>
#include <cstdlib>

sem_t exit_sem;

void *find_num(void* arg)
{
    double num = rand();
    for(double q=0; 1; q++)
        if(num == q)
        {
            std::cout << "\n--";
            return 0;
        }

    /* Tell the main thread we have exited.  */
    sem_post (&exit_sem);
    return NULL;
}

int main ()
{
    srand(0);

    /* Initialize pocess private semaphore with 0 initial count.  */
    sem_init (&exit_sem, 0, 0);
    const int N = 2;

    pthread_attr_t attr;
    pthread_attr_init (&attr);
    pthread_attr_setdetachstate (&attr, PTHREAD_CREATE_DETACHED);
    for(int q=0; q<N; q++)
        pthread_create(NULL, &attr, find_num, NULL);

    int M = 30;
    int launched=N;
    int finnished=0;
    while(1)
    {
        for(int w=0; w<N; w++)
        {
            /* Wait for any thread to exit, don't care which.  */
            sem_wait (&exit_sem);

            finnished++;
            std::cout << "\n" << finnished;
            if(finnished == M)
                break;
            if(launched < M)
            {
                pthread_create(NULL, &attr, find_num, NULL);
                launched++;
            }
        }

        if(finnished == M)
            break;
    }
}

Anyway, I would again recommend thread-pool/work-queue approach.

Answer 2

If main() is waiting on pthread_join then (assuming that on your platform it isn't implemented as just a spin lock) it will cause no CPU load; If pthread_join is waiting on a mutex then the scheduler won't give that thread any time until that mutex is signalled.

If N truly is the number of cores then maybe you should forget about managing the thread scheduling yourself; The OS scheduler will take care of that. If N is less than the number or cores, perhaps you could set thread affinity to run your process on N cores only (or spawn a calculation process which does that, if you don't want to set thread affinity for the rest of your process); Again, the point of doing this would be to let the OS scheduler deal with scheduling.

Answer 3

I'd most definitely look at OpenMP or C++11 async . To be honest, at this point I think that OpenMP is more viable.

OpenMP

Here is a quick example that will sometimes find the correct answer ( 42 ) randomly, using 2 threads.

Note that if you leave out the omp.h include and the call to omp_set_num_threads(2); you'll get the native number of threads (ie depending on the number of cores available at runtime). OpenMP also let's you configure this number dynamically by setting the environment variable eg OMP_NUM_THREADS=16 . Indeed you can dynamically disable parallelism in whole :)

_{I even threw in a sample thread parameter and result accumulation - this is usually where things become a little more interesting then just kicking off a job and forgetting about it.} _{Then again, it may be overkill for your question :)}

Compiled with g++ -fopenmp test.cpp -o test

#include <iostream>
#include <cstdlib>
#include <omp.h>

int find_num(int w)
{
    return rand() % 100;
}

int main ()
{
    srand(time(0));

    omp_set_num_threads(2); // optional! leave it out to get native number of threads

    bool found = false;

#pragma omp parallel for reduction (|:found)
    for(int w=0; w<30; w++)
    {
        if (!found) 
        {
             found = (find_num(w) == 42);
        }
    }

    std::cout << "number was found: " << (found? "yes":"no") << std::endl;
}

Answer 4

一种简单的解决方案是有一个全局变量在线程完成时设置，主循环会轮询该变量以检查踏步完成的时间，然后仅执行pthread_join 。

C++ Pthreads: Algorithm to run N threads simultaneously when >N threads must be run each iteration

Question

4 answers

solution1
5 ACCPTED 2011-11-01 08:38:38

solution2
1 2011-11-01 08:07:36

solution3
1 2011-11-01 08:08:29

OpenMP

solution4
1 2011-11-01 08:09:21

C++ Pthreads: Algorithm to run N threads simultaneously when >N threads must be run each iteration

Question

4 answers

solution1 5 ACCPTED 2011-11-01 08:38:38

solution2 1 2011-11-01 08:07:36

solution3 1 2011-11-01 08:08:29

OpenMP

solution4 1 2011-11-01 08:09:21

solution1
5 ACCPTED 2011-11-01 08:38:38

solution2
1 2011-11-01 08:07:36

solution3
1 2011-11-01 08:08:29

solution4
1 2011-11-01 08:09:21