简体   繁体   中英

(Operating System) How can I use __asm mfence in c

I'm taking an operating system class and my professor gave us this homework.

"Place __asm mfence in a proper position."

This problem is about using multiple threads and its side-effect.

Main thread is increasing shared_var but thread_1 is doing it in the same time.

Thus, shared_var becomes 199048359.000 when the code is increasing number 2000000 times.

The professor said __asm mfence will solve this issue. But, I do not know where to place it.

I'm trying to search the problem on google, github and here but I cannot find a source.

I do not know this is a stupid question because I'm not majoring in computer science.

Also, I would like to know why this code shows 199948358.0000 not 2000000.00

Any help would be greatly appreciated.

#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
#include <conio.h>

int turn;
int interested[2];
void EnterRegion(int process);
void LeaveRegion(int process);

DWORD WINAPI thread_func_1(LPVOID lpParam);
 volatile  double   shared_var = 0.0;
volatile int    job_complete[2] = {0, 0};


int main(void)
{
    DWORD dwThreadId_1, dwThrdParam_1 = 1; 
    HANDLE hThread_1; 
    int     i, j;

    // Create Thread 1
    hThread_1 = CreateThread( 
        NULL,                        // default security attributes 
        0,                           // use default stack size  
        thread_func_1,                  // thread function 
        &dwThrdParam_1,                // argument to thread function 
        0,                           // use default creation flags 
        &dwThreadId_1
        );                // returns the thread identifier 

   // Check the return value for success. 

    if (hThread_1 == NULL) 
    {
       printf("Thread 1 creation error\n");
       exit(0);
    }
    else 
    {
       CloseHandle( hThread_1 );
    }

    /* I am main thread */
    /* Now Main Thread and Thread 1 runs concurrently */

    for (i = 0; i < 10000; i++) 
    {
        for (j = 0; j < 10000; j++) 
        {
            EnterRegion(0);
            shared_var++;
            LeaveRegion(0);
        }
    }

    printf("Main Thread completed\n");
    job_complete[0] = 1;
    while (job_complete[1] == 0) ;

    printf("%f\n", shared_var);
    _getch();
    ExitProcess(0);
}


DWORD WINAPI thread_func_1(LPVOID lpParam)
{
    int     i, j;

    for (i = 0; i < 10000; i++) {
        for (j = 0; j < 10000; j++) 
        {
            EnterRegion(1);
            shared_var++;
            LeaveRegion(1);
        }
    }

    printf("Thread_1 completed\n");
    job_complete[1] = 1;
    ExitThread(0);
}


void EnterRegion(int process)
{
    _asm mfence;
    int other;

    other = 1 - process;
    interested[process] = TRUE;
    turn = process;
    while (turn == process && interested[other] == TRUE) {}
    _asm mfence;
}

void LeaveRegion(int process)
{
    _asm mfence;
    interested[process] = FALSE;
    _asm mfence;
}

The EnterRegion() and LeaveRegion() functions are implementing a critical region using a thing called "Peterson's algorithm".

Now, the key to Peterson's algorithm is that when a thread reads turn it must get the latest (most recent) value written by any thread. That is, operations on turn must be Sequentially Consistent. Also, the write to interested[] in EnterRegion() must become visible to all threads before (or at the same time) as the write to turn .

So the place to put the mfence is after the turn = process; -- so that the thread does not proceed until its write to turn is visible to all other threads.

It is also important to persuade the compiler to read from memory every time it reads turn and interested[] , so you should set them volatile .

If you are writing this for x86 or x86_64, that is sufficient -- because they are generally "well behaved", so that:

  • all the writes to turn and interested[process] will occur in program order

  • all the reads of turn and interested[other] will also occur in program order

and setting those volatile ensures that the compiler doesn't fiddle with the order, either.

The reason for using the mfence on the x86 and x86_64 in this case is to flush the write queue to memory before proceeding to read the turn value. So, all memory writes go into a queue, and at some time in the future each write will reach actual memory, and the effect of the write will become visible to other threads -- the write has "completed". Writes "complete" in the same order the program did them, but delayed. If the thread reads something it has written recently, the processor will pick the (most recent) value out of the write queue. This means that the thread does not need to wait until the write "completes", which is generally a Good Thing. However, it does mean that the thread is not reading the same value that any other thread will read, at least until the write does "complete". What the mfence does is to stall the processor until all outstanding writes have "completed" -- so any following reads will read the same thing any other thread would read.

The write to interested[] in LeaveRegion() does not (on x86/x86_64) require an mfence , which is good because mfence is a costly operation. Each thread only ever writes to its own interested[] flag and only ever reads the other's. The only constraint on this write is that it must not "complete" after the write in EnterRegion() (.). Happily the x86/x86_64 does all writes in order, [Though, of course, after the write in LeaveRegion() the write in EnterRegion() may "complete" before the other thread reads the flag.]

For other devices, you might want other fences to enforce the ordering of reads/writes of turn and interested[] . But I don't pretend to know enough to advise on ARM or POWERPC or anything else.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM