Making a function thread safe: Thread-specific data vs mutex

Question

From The Linux Progamming Interface , in §31.3.4 Employing the Thread-Specific Data API , it gives a good example of using thread-specific data to make a thread-unsafe function thead-safe:

Thead-unsafe version:

/* Listing 31-1 */

/* strerror.c

   An implementation of strerror() that is not thread-safe.
*/
#define _GNU_SOURCE                 /* Get '_sys_nerr' and '_sys_errlist'
                                       declarations from <stdio.h> */
#include <stdio.h>
#include <string.h>                 /* Get declaration of strerror() */

#define MAX_ERROR_LEN 256           /* Maximum length of string
                                       returned by strerror() */

static char buf[MAX_ERROR_LEN];     /* Statically allocated return buffer */

char *
strerror(int err)
{
    if (err < 0 || err >= _sys_nerr || _sys_errlist[err] == NULL) {
        snprintf(buf, MAX_ERROR_LEN, "Unknown error %d", err);
    } else {
        strncpy(buf, _sys_errlist[err], MAX_ERROR_LEN - 1);
        buf[MAX_ERROR_LEN - 1] = '\0';          /* Ensure null termination */
    }

    return buf;
}

Thread-safe version with thread-specific data:

/* Listing 31-3 */

/* strerror_tsd.c

   An implementation of strerror() that is made thread-safe through
   the use of thread-specific data.

   See also strerror_tls.c.
*/
#define _GNU_SOURCE                 /* Get '_sys_nerr' and '_sys_errlist'
                                       declarations from <stdio.h> */
#include <stdio.h>
#include <string.h>                 /* Get declaration of strerror() */
#include <pthread.h>
#include "tlpi_hdr.h"

static pthread_once_t once = PTHREAD_ONCE_INIT;
static pthread_key_t strerrorKey;

#define MAX_ERROR_LEN 256           /* Maximum length of string in per-thread
                                       buffer returned by strerror() */

static void                         /* Free thread-specific data buffer */
destructor(void *buf)
{
    free(buf);
}

static void                         /* One-time key creation function */
createKey(void)
{
    int s;

    /* Allocate a unique thread-specific data key and save the address
       of the destructor for thread-specific data buffers */

    s = pthread_key_create(&strerrorKey, destructor);
    if (s != 0)
        errExitEN(s, "pthread_key_create");
}

char *
strerror(int err)
{
    int s;
    char *buf;

    /* Make first caller allocate key for thread-specific data */

    s = pthread_once(&once, createKey);
    if (s != 0)
        errExitEN(s, "pthread_once");

    buf = pthread_getspecific(strerrorKey);
    if (buf == NULL) {          /* If first call from this thread, allocate
                                   buffer for thread, and save its location */
        buf = malloc(MAX_ERROR_LEN);
        if (buf == NULL)
            errExit("malloc");

        s = pthread_setspecific(strerrorKey, buf);
        if (s != 0)
            errExitEN(s, "pthread_setspecific");
    }

    if (err < 0 || err >= _sys_nerr || _sys_errlist[err] == NULL) {
        snprintf(buf, MAX_ERROR_LEN, "Unknown error %d", err);
    } else {
        strncpy(buf, _sys_errlist[err], MAX_ERROR_LEN - 1);
        buf[MAX_ERROR_LEN - 1] = '\0';          /* Ensure null termination */
    }

    return buf;
}

And in the Summary section of this chapter it says:

...
Most of the functions specified in SUSv3 are required to be thread-safe. SUSv3 also lists a small set of functions that are not required to be thread-safe. Typically, these are functions that employ static storage to return information to the caller or to maintain information between successive calls. By definition, such functions are not reentrant, and mutexes can't be used to make them thread-safe . We considered two roughly equivalent coding techniques—thread-specific data and thread-local storage—that can be used to render an unsafe function thread-safe without needing to change its interface.
...

I understand that using thread-specific data aims to make the thread-unsafe function to a thread-safe one, without changing the function's interface/signature .

But I don't understand:

By definition, such functions are not reentrant, and mutexes can't be used to make them thread-safe.

Question:

Why is it saying that "mutexes can't be use... while thread-specific data can...."? Is there any conditions that I can make a thread-unsafe function thread-safe only with thread-specific data but not with mutex?
I think I can make the thread-unsafe strerror() to a thead-safe one, simply adding a mutex. Does it make any difference compared to the posted one using thread-specifia data? (Maybe lost some concurrent efficiency? Since I am gonna use a mutex to lock the code accessing the static variable)

Answer 1

I think I can make the thread-unsafe strerror() to a thead-safe one, simply adding a mutex.

Well, you're wrong, and the authors of SUSv3 are right.

To see why a mutex can't make these non-reentrant functions threadsafe, consider the original (unsafe) code for strerror .

Adding a mutex can make strerror itself safe.

That is, we can avoid data races between concurrent calls to strerror in different threads.

This is what I think you had in mind: lock a mutex at the start, unlock at the end, job done. Easy.

It is also, however, entirely worthless - because the caller can never safely use the returned buffer: that's still shared with other threads and the mutex only protects it inside the call to strerror .

The only ways to make the function safe and useful (using mutexes) is for the caller to hold the mutex until it has finished using the buffer, which ... requires an interface change.

Answer 2

Why is it saying that "mutexes can't be use... while thread-specific data can...."?

Mutexes protect shared data only within the region protected by the mutex. If all such regions are protected by the same mutex then all is well, but consider a function such as strtok() , which stores static state between calls. That state could be protected against data races by use of a mutex, but that does not protect two different threads interfering with each other if they try to use strtok at the same time -- they can produce unexpected and unwanted changes in strtok 's internal state, with respect to what the other thread expects. This is exactly the reason why strtok_r() was introduced.

Or consider a function such as ctime() , which returns a pointer to static data. Not only can two threads corrupt each other's (shared) data by overwriting it via calling ctime , but they can even directly modify it by manipulating it through the pointer.

Even if there were a mutex protecting such data and exposed to user code, the library cannot ensure that all user threads will cooperate by using it appropriately. What's more, using such a mutex would create bottlenecks, and providing multiple different mutexes for such purposes would create abundant opportunities for deadlock.

Thread-specific data, on the other hand, works around such issues by maintaining separate data for each thread, automatically. It does not protect a thread from interfering with itself, and it can be foiled by client code leaking thread-specific-data pointers across threads, but it still provides safety that mutexes do not. Plus, it does not create bottlenecks and does not contribute to deadlocking.

Is there any conditions that I can make a thread-unsafe function thread-safe only with thread-specific data but not with mutex?

Analogs of the strtok() and ctime() functions discussed above could be written with use of thread-local storage instead of static data. Implemented correctly, such a strtok_tsd() function would be perfectly thread-safe. Such a ctime_tsd() function would be thread-safe, too, subject to the limitation that user code must not leak any pointers to its TSD region to another thread.

The flip side, of course, is that thread specific data is totally unsuited for data that are supposed to be shared among threads. This is a clear and natural distinction between the regimes best served by each approach. Thread-specific data provides an analog of mutable, static data that is suited for using in multi-threading scenarios where the data involved are or may be tied to specific series of computations, and thus should not be shared among threads.

I think I can make the thread-unsafe strerror() to a thead-safe one, simply adding a mutex.

Nope. strerror() is another function in the mold of ctime() . The problem is not so much that strerror() itself is unsafe, but that there is no safe way for a multithreaded program to use its result.

Does it make any difference compared to the posted one using thread-specifia data?

Yes. Returning (a pointer to) thread-specific data allows the calling thread to safely access the result. Returning (a pointer to) static data does not, mutex usage within the called function notwithstanding.

Making a function thread safe: Thread-specific data vs mutex

Question

2 answers

solution1
6 ACCPTED 2019-12-26 17:53:20

solution2
2 2019-12-26 17:52:47

Making a function thread safe: Thread-specific data vs mutex

Question

2 answers

solution1 6 ACCPTED 2019-12-26 17:53:20

solution2 2 2019-12-26 17:52:47

solution1
6 ACCPTED 2019-12-26 17:53:20

solution2
2 2019-12-26 17:52:47