Volatile and CreateThread

Question

I just asked a question involving volatile: volatile array c++

However my question spawned a discussion on what volatile does.

Some claim that when using the CreateThread() , you don't have to worry about volatiles . Microsoft on the other hand gives an example of volatile when using two threads created by CreateThread() .

I created the following sample in visual c++ express 2010, and it doesn't matter if you mark done as volatile or not

#include "targetver.h"
#include <Windows.h>
#include <stdio.h>
#include <iostream>
#include <tchar.h>

using namespace std;

bool done = false;
DWORD WINAPI thread1(LPVOID args)
{
    while(!done)
    {

    }
    cout << "Thread 1 done!\n";
    return 0;
}
DWORD WINAPI thread2(LPVOID args)
{
    Sleep(1000);
    done = 1;
    cout << "Thread 2 done!\n";
    return 0;
}

int _tmain(int argc, _TCHAR* argv[])
{
DWORD thread1Id;
HANDLE hThread1;
DWORD thread2Id;
HANDLE hThread2;

hThread1 = CreateThread(NULL, 0, thread1, NULL, 0, &thread1Id);
hThread2 = CreateThread(NULL, 0, thread2, NULL, 0, &thread2Id);
Sleep(4000);
CloseHandle(hThread1);
CloseHandle(hThread2);

return 0;
}

Can you ALWAYS be sure that thread 1 will stop if done is not volatile ?

Answer 1

What volatile does:

Prevents the compiler from optimizing out any access. Every read/write will result in a read/write instruction.
Prevents the compiler from reordering the access with other volatiles .

What volatile does not:

Make the access atomic.
Prevent the compiler from reordering with non-volatile accesses.
Make changes from one thread visible in another thread.

Some non-portable behaviors that shouldn't be relied on in cross-platform C++:

VC++ has extended volatile to prevent any reordering with other instructions. Other compilers don't, because it negatively affects optimization.
x86 makes aligned read/write of pointer-sized and smaller variables atomic, and immediately visible to other threads. Other architectures don't.

Most of the time, what people really want are fences (also called barriers) and atomic instructions, which are usable if you've got a C++11 compiler, or via compiler- and architecture-dependent functions otherwise.

Fences ensure that, at the point of use, all the previous reads/writes will be completed. In C++11, fences are controlled at various points using the std::memory_order enumeration. In VC++ you can use _ReadBarrier() , _WriteBarrier() , and _ReadWriteBarrier() to do this. I'm not sure about other compilers.

On some architectures like x86, a fence is merely a way to prevent the compiler from reordering instructions. On others they might actually emit an instruction to prevent the CPU itself from reordering things.

Here's an example of improper use:

int res1, res2;
volatile bool finished;

void work_thread(int a, int b)
{
    res1 = a + b;
    res2 = a - b;
    finished = true;
}

void spinning_thread()
{
    while(!finished); // spin wait for res to be set.
}

Here, finished is allowed to be reordered to before either res is set! Well, volatile prevents reordering with other volatile, right? Let's try making each res volatile too:

volatile int res1, res2;
volatile bool finished;

void work_thread(int a, int b)
{
    res1 = a + b;
    res2 = a - b;
    finished = true;
}

void spinning_thread()
{
    while(!finished); // spin wait for res to be set.
}

This trivial example will actually work on x86, but it is going to be inefficient. For one, this forces res1 to be set before res2 , even though we don't really care about that... we just want both of them set before finished is. Forcing this ordering between res1 and res2 will only prevent valid optimizations, eating away at performance.

For more complex problems, you'll have to make every write volatile . This would bloat your code, be very error prone, and become slow as it prevents a lot more reordering than you really wanted.

It's not realistic. So we use fences and atomics. They allow full optimization, and only guarantee that the memory access will complete at the point of the fence:

int res1, res2;
std::atomic<bool> finished;

void work_thread(int a, int b)
{
    res1 = a + b;
    res2 = a - b;
    finished.store(true, std::memory_order_release);
}

void spinning_thread()
{
    while(!finished.load(std::memory_order_acquire));
}

This will work for all architectures. res1 and res2 operations can be reordered as the compiler sees fit. Performing an atomic release ensures that all non-atomic ops are ordered to complete and be visible to threads which perform an atomic acquire .

Answer 2

volatile simply prevents the compiler from making assumptions (read:optimizing) access to the value declared volatile . In other words, if you declare something volatile , you are basically saying it may change it's value at any time for reasons the compiler is not aware of, so any time you reference the variable it must look up the value at that time.
In this instance, the compiler might decide to actually cache done 's value in a processor register, independent of changes that might happen elsewhere - ie thread 2 setting it to true .
I would guess the reason it worked in your example is all references to done were actually the real location of done in memory. You cannot expect this to always be the case, especially when you start requesting higher levels of optimization.
Additionally, I would like to point out that it is a not an appropriate use of the volatile keyword for synchronization. It might happen to be atomic, but only by circumstance. I would advise you to use an actualy thread synchronization construct like a wait condition or mutex instead. See http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/ for a fantastic explanation.

Answer 3

Can you ALWAYS be sure that thread 1 will stop if done is not volatile ?

Always? No. But in this case the assignment to done is in the same module, and the while loop will probably not be optimized out. Depends on how the MSVC performs its optimizations.

Generally, it is safer to declare it with volatile to avoid uncertainty with optimizations.

Answer 4

It's worse than you think, actually - some compilers may decide that that loop is either a no-op or infinite loop , eliminate the infinite loop case, and make it return immediately no matter what done is . And the compiler is most certainly free to keep done in a local CPU register and never access its updated value in the loop. You must either use appropriate memory barriers, or a volatile flag variable (this technically isn't enough on certain CPU architectures), or a lock-protected variable for a flag like this.

Answer 5

Compiling on linux, g++ 4.1.2, I put in the equivalent of your example:

#include <pthread.h>

bool done = false;

void* thread_func(void*r) {
  while(!done) {};
  return NULL;
}

void* write_thread_func(void*r) {
  done = true;
  return NULL;
}


int main() {
  pthread_t t1,t2;
  pthread_create(&t1, NULL, thread_func, NULL);
  pthread_create(&t2, NULL, write_thread_func, NULL);
  pthread_join(t1, NULL);
  pthread_join(t2, NULL);
}

When compiled with -O3, the compiler cached the value, so it checked once and then enters an infinite loop if it wasn't done the first time.

However, then I changed the program to the following:

#include <pthread.h>

bool done = false;
pthread_mutex_t mu = PTHREAD_MUTEX_INITIALIZER;

void* thread_func(void*r) {
  pthread_mutex_lock(&mu);
  while(!done) {
    pthread_mutex_unlock(&mu);
    pthread_mutex_lock(&mu);
  };
  pthread_mutex_unlock(&mu);
  return NULL;
}

void* write_thread_func(void*r) {

  pthread_mutex_lock(&mu);
  done = true;
  pthread_mutex_unlock(&mu);
  return NULL;
}


int main() {
  pthread_t t1,t2;
  pthread_create(&t1, NULL, thread_func, NULL);
  pthread_create(&t2, NULL, write_thread_func, NULL);
  pthread_join(t1, NULL);
  pthread_join(t2, NULL);
}

While this is still a spin (it just repeatedly locks/unlocks the mutex), the compiler changed the call to always check the value of done after the return from pthread_mutex_unlock, causing it to work properly.

Further tests show that calling any external function appears to cause it to re-examine the variable.

Answer 6

volatile IS NOT a synchronisation mechanism. It DOES NOT guarantee atomicity nor ordering. If you cannot guarantee that all operations performed on a shared resource are atomic, then you MUST use proper locking !

Finally, I highly recommend reading these articles:

Volatile and CreateThread

Question

6 answers

solution1
9 ACCPTED 2011-07-28 23:20:19

solution2
3 2011-07-28 21:54:14

solution3
1 2011-07-28 22:02:26

solution4
1 2011-07-28 22:05:16

solution5
0 2011-07-28 22:14:58

solution6
0 2011-07-28 22:41:31

Volatile and CreateThread

Question

6 answers

solution1 9 ACCPTED 2011-07-28 23:20:19

solution2 3 2011-07-28 21:54:14

solution3 1 2011-07-28 22:02:26

solution4 1 2011-07-28 22:05:16

solution5 0 2011-07-28 22:14:58

solution6 0 2011-07-28 22:41:31

solution1
9 ACCPTED 2011-07-28 23:20:19

solution2
3 2011-07-28 21:54:14

solution3
1 2011-07-28 22:02:26

solution4
1 2011-07-28 22:05:16

solution5
0 2011-07-28 22:14:58

solution6
0 2011-07-28 22:41:31