HANDLE used in C++

Question

I have a question regarding CreateMutex()

I am working on image data, and do certain calculations for different rotations of the image. I rotate the image in 180 steps (1° steps), and since these are independent from each other except writing the results back, I have decided to make this multi-threaded (very intensive calculations, and writing to memory takes up like no time of the execution).

I tried at first using a single mutex which allows a thread to write or not to write, but that decreased my performance a lot (got from 100% time with single thread, no mutex, to around 80% execution speed).

I have then created an array of HANDLE s, one per pixel (since its 656x480, its around 300k handles). This has improved my code to around 15% execution time (7 threads simultaneously).

Now when I watch this in the task manager, I see it has its own category called Handles, and this goes between 30k (with only some programs and OS running), and goes to 350k with my code running.

Is this behaviour okay, or is it bad and should be changed, if so, why, and how?

Answer 1

I would say that a single process using 350k+ handles is far too many. (One handle per pixel, really?)

If you're looking to improve the overall efficiency of your application using multiple threads, a good thing to do is reduce the amount of contention between those threads. I'm not quite sure what your application is doing, but if you are creating 180 different rotations of a single source image, then you might consider making N copies of the source image (where N is the number of threads you want to run), and having each thread work on its own copy of the source image. Then you won't need to have mutexes at all, and you will reduce the contention between threads.

Answer 2

You should be using CRITICAL_SECTION , not mutexes. They are much faster. You can get spinlock-like behaviour if you initialise with InitializeCriticalSectionAndSpinCount() .

Like others have said, having a mutex for every pixel is insane. How many threads do you have?

You ought not to require any locking at all, and you could process an image in parallel with OpenMP instead of making all these threads yourself. The thing with OpenMP, is you could have one parallelized outer loop going over each row of the output image, and inside that you look at each pixel in that row. Now your output is independent.

To do the rotation, you find the pixel position of the inverse rotation from that output pixel's position and you area-sample the colour values at that position. This should not be computationally intensive at all, especially since you only have to do a single sin and cos calculation for each image (your angle doesn't change for every pixel).

So, to recap... No worker threads, no mutexes, no redundant calls to sin/cos. You'll be surprised how quick your code ends up.

double sintheta = sin(theta);
double costheta = cos(theta);

#pragma omp parallel for 
for( int y = 0; y < height; y++ ) {
    RGB * inputRow = &inputImage[y * width];
    RGB * outputRow = &outputImage[y * width];

    for( int x = 0; x < width; x++ ) {
        // Whatever your rotation code should be.... =)
        double rotx = -((double)x - xCentre) * costheta;
        double roty = -((double)y - yCentre) * sintheta;

        // Interpolate colour from input image.  We've landed inside
        // a 2x2 square of pixels.  Take some of each.  I'll leave the
        // sampling to you...
        RGB val;
        // TODO

        // Output the rotated pixel without thread contention.
        outputRow[x] = val;
    }
}

HANDLE used in C++

Question

2 answers

solution1
3 ACCPTED 2012-07-30 23:07:56

solution2
0 2012-07-31 02:46:57

HANDLE used in C++

Question

2 answers

solution1 3 ACCPTED 2012-07-30 23:07:56

solution2 0 2012-07-31 02:46:57

solution1
3 ACCPTED 2012-07-30 23:07:56

solution2
0 2012-07-31 02:46:57