What's a good strategy for clean/reliable shutdown of threads that use pthread barriers for synchronization?

Question

I've got a pthread-based multithreaded program that has four threads indefinitely executing this run-loop (pseudocode):

while(keepRunning)
{
   pthread_barrier_wait(&g_stage_one_barrier);

   UpdateThisThreadsStateVariables();  

   pthread_barrier_wait(&g_stage_two_barrier);

   DoComputationsThatReadFromAllThreadsStateVariables();
}

This works pretty well, in that during stage one each thread updates its own state variables, and that's okay because no other thread is reading any other thread's state variables during stage one. Then during stage two it's a free-for-all as far as threads reading each others' state is concerned, but that's okay because during stage two no thread is modifying its local state variables, so they are effectively read-only.

My only remaining problem is, how do I cleanly and reliably shut down these threads when it's time for my application to quit? (By "cleanly and reliably", I mean without introducing potential deadlocks or race conditions, and ideally without having to send any UNIX-signals to force threads out of a pthread_barrier_wait() call)

My main() thread can of course set keepRunning to false for each thread, but then how does it get pthread_barrier_wait() to return for each thread? AFAICT the only way to get pthread_barrier_wait() to return is to have all four threads' execution-location inside pthread_barrier_wait() simultaneously, but that's difficult to do when some threads may have exited already.

Calling pthread_barrier_destroy() seems like what I'd want to do, but it's undefined behavior to do that while any threads might be waiting on the barrier.

Is there a good known solution to this problem?

Answer 1

Having two flags and using something like the following should work:

for (;;)
{
    pthread_barrier_wait(&g_stage_one_barrier);           +
                                                          |
    UpdateThisThreadsStateVariables();                    |
                                                          |
    pthread_mutex_lock(&shutdownMtx);                     | Zone 1
    pendingShutdown = !keepRunning;                       |
    pthread_mutex_unlock(&shutdownMtx);                   |
                                                          |
    pthread_barrier_wait(&g_stage_two_barrier);           +
                                                          |
    if (pendingShutdown)                                  |
        break;                                            | Zone 2
                                                          |
    DoComputationsThatReadFromAllThreadsStateVariables(); |
}

shutdownMtx should protect the setting of keepRunning too, though it's not shown.

The logic is that by the time pendingShutdown gets set to true , all the threads must be within Zone 1 . (This is true even if only some of the threads saw keepRunning being false , so races on keepRunning should be okay.) It follows that they will all reach pthread_barrier_wait(&g_stage_two_barrier) , and then all break out when they enter Zone 2 .

It would also be possible to check for PTHREAD_BARRIER_SERIAL_THREAD -- which is returned by pthread_barrier_wait() for exactly one of the threads -- and only do the locking and updating of pendingShutdown in that thread, which could improve performance.

Answer 2

You could have an additional thread that synchronises on the same barriers, but only exists as a "shutdown master". Your worker threads would use the exact code that you have in your question, and the "shutdown master" thread would do:

while (keepRunning)
{
    pthread_barrier_wait(&g_stage_one_barrier);

    pthread_mutex_lock(&mkr_lock);
    if (!mainKeepRunning)
        keepRunning = 0;
    pthread_mutex_unlock(&mkr_lock);

    pthread_barrier_wait(&g_stage_two_barrier);
}

When the main thread wants the other threads to shut down, it would just do:

pthread_mutex_lock(&mkr_lock);
mainKeepRunning = 0;
pthread_mutex_unlock(&mkr_lock);

(ie. the keepRunning variable becomes part of the shared thread state that is read-only during stage 2, and is owned by the shutdown master thread during stage 1).

Of course, you can also just pick one of your other threads to be the "shutdown master thread" rather than using a dedicated thread for that purpose.

Answer 3

There is a conflict of requirements: barrier semantics require all threads to be in to continue, and shutdown requires termination when threads are shared between execution blocks (could be inside different barriers).

I suggest to replace barrier with a custom implementation that would support extern cancel call.

Example (may not run, but the idea...):

struct _barrier_entry
{
  pthread_cond_t cond;
  volatile bool released;
  volatile struct _barrier_entry *next;
};

typedef struct
{
  volatile int capacity;
  volatile int count;
  volatile struct _barrier_entry *first;
  pthread_mutex_t lock;
} custom_barrier_t;

Initialization:

int custom_barrier_init(custom_barrier_t *barrier, int capacity)
{
   if (NULL == barrier || capacity <= 0)
   {
     errno = EINVAL;
     return -1;
   }
   barrier->capacity = capacity;
   barrier->count = 0;
   barrier->first = NULL;
   return pthread_mutex_init(&barrier->lock, NULL);
   return -1;
}

Helper:

static void _custom_barrier_flush(custom_barrier_t *barrier)
{
   struct _barrier_entry *ptr;
   for (ptr = barrier->first; NULL != ptr;)
   {
     struct _barrier_entry *next = ptr->next;
     ptr->released = true;
     pthread_cond_signal(&ptr->cond);
     ptr = next;
   }
   barrier->first = NULL;
   barrier->count = 0;
}

Blocking wait:

int custom_barrier_wait(custom_barrier_t *barrier)
{
   struct _barrier_entry entry;
   int result;
   pthread_cond_init(&barrier->entry, NULL);
   entry->next = NULL;
   entry->released = false;

   pthread_mutex_lock(&barrier->lock);
   barrier->count++;
   if (barrier->count == barrier->capacity)
   {
     _custom_barrier_flush(barrier);
     result = 0;
   }
   else
   {
     entry->next = barrier->first;
     barrier->first = entry;
     while (true)
     {
       pthread_cond_wait(&entry->cond, &barrier->lock);
       if (entry->released)
       {
         result = 0;
         break;
       }
       if (barrier->capacity < 0)
       {
         errno = ECANCELLED;
         result = -1;
         break;
       }
     }
   }
   pthread_mutex_unlock(&barrier->lock);
   pthread_cond_destroy(&entry->cond);
   return result;
}

Cancellation:

 int custom_barrier_cancel(custom_barrier_t *barrier)
 {
   pthread_mutex_lock(barrier->lock);
   barrier->capacity = -1;
   _custom_barrier_flush(barrier);
   pthread_mutex_unlock(barrier->lock);
   return 0;
 }

So the thread code can run in the loop, until it gets ECANCELLED error after custom_barrier_wait call.

Answer 4

The threads that are waiting at the barriers are not the issue, it's the threads that are still running UpdateThis... or DoComputations... that will delay the shutdown. You can reduce the shutdown time by periodically checking for shutdown inside the UpdateThis... and DoComputations... functions.

Here's the outline of one possible solution

main initializes a mutex g_shutdown_mutex
main locks the mutex
main launches the threads
the threads do their thing while periodically trying to lock the mutex, but since main has the mutex locked, the trylock function will always fail
when it's time to shutdown, main unlocks the mutex
now the trylock will succeed and the worker functions will return early
before reaching the second barrier, any thread that successfully locks the mutex sets a global variable g_shutdown_requested
after passing the second barrier, all the threads will see the same value in g_shutdown_requested and make the same decision whether to exit or not

So the while loop looks like this

while(1)
{
    pthread_barrier_wait(&g_stage_one_barrier);

    UpdateThisThreadsStateVariables();

    if ( pthread_mutex_trylock( &g_shutdown_mutex ) == 0 )
    {
        g_shutdown_requested = true;
        pthread_mutex_unlock( &g_shutdown_mutex );
        break;
    }

    pthread_barrier_wait(&g_stage_two_barrier);

    if ( g_shutdown_requested )
        break;

    DoComputationsThatReadFromAllThreadsStateVariables();
}

And the worker functions look like this

void UpdateThisThreadsStateVariables( void )
{
    for ( i = 0;; i++ )
    {
        // check the mutex once every 4000 times through the loop
        if ( (i & 0xfff) == 0 && pthread_mutex_trylock( &g_shutdown_mutex ) == 0 )
        {
            pthread_mutex_unlock( &g_shutdown_mutex );   // abnormal termination
            return; 
        }

        // do the important stuff here

        if ( doneWithTheImportantStuff )    // normal termination
            break;
    }
}

What's a good strategy for clean/reliable shutdown of threads that use pthread barriers for synchronization?

Question

4 answers

solution1
2 ACCPTED 2015-03-04 00:26:55

solution2
2 2015-03-04 04:23:14

solution3
1 2015-03-04 00:27:30

solution4
0 2015-03-03 23:59:03

What's a good strategy for clean/reliable shutdown of threads that use pthread barriers for synchronization?

Question

4 answers

solution1 2 ACCPTED 2015-03-04 00:26:55

solution2 2 2015-03-04 04:23:14

solution3 1 2015-03-04 00:27:30

solution4 0 2015-03-03 23:59:03

solution1
2 ACCPTED 2015-03-04 00:26:55

solution2
2 2015-03-04 04:23:14

solution3
1 2015-03-04 00:27:30

solution4
0 2015-03-03 23:59:03