Serial code much slower than using only one thread in C?

Question

So, I was doing some benchmark tests with threads, and i wrote these pieces of code:

resp_threadless[] and resp_threaded[] are global int arrays and their size is n;

int n = 100000;

void function() {
  for (long j = 0; j < n; ++j) {
    int count = 0;
    double x = vetor[j];
      while (x > 1.0) {
      x = sqrt(x);
      ++count;
    }
   resp_threadless[j] = count;
  }
}

DWORD WINAPI function_th( LPVOID lpParam ) {
for (long j = 0; j < n; ++j) {
    int count = 0;
    double x = vetor[j];
      while (x > 1.0) {
      x = sqrt(x);
      ++count;
    }
   resp_threadless[j] = count;
  }
}

I benchmarked the first function by just calling her:

function();

And the second one like this:

HANDLE hThreadArray[1];
DWORD dwThreads[1];
hThreadArray[0] = CreateThread(NULL, 0, function_th, NULL , 0, &(dwThreads[0]));
WaitForMultipleObjects(1, hThreadArray, TRUE, INFINITE);
CloseHandle(hThreadArray[0]);

Keep in mind that I know that calling multiple threads using function_th() will not parallelize it, this is just a test because i was having really strange results, so I decided to see what would happen with one thread and one function using the SAME code.

I tested this in a Intel Atom N270, and windows XP with NUMPROC = 1.

Results: Serial code: 1485 ms One Thread: 425 ms

I've had similar results using multiprocessor machines, and even with code using semaphores to parallelize the work done by the threads.

Does anyone has any idea of what could be happening?

EDIT

Inverting the order, running multiple times each one, etc... -> No change

Higher N -> Thread one is proportionally even faster

Using QueryPerformanceCounter() -> No change

Thread Creation Overhead -> Should make the threaded even one slower, not faster

Original code: http://pastebin.com/tgmp5p1G

Answer 1

It's a cache hit matter. I suspect you did the benchmark in the order you described it in your question. The function was called first and the thread was called after. When you benchmark this in more detail, you will observe the reason: Data (sqrt) is availabel in cache, thus the code will execute much faster. Test to proove:

Run the function() twice or even more often before calling the thread. The second call to function will give the quicker result already.
Call the thread before the function and your result will show the opposite. The function will show the better result.

Reason: All of the sqrt calculation (or at least lots of them) are available in cache and don't have to be recalculated. That's a lot faster.

Serial code much slower than using only one thread in C?

Question

1 answers

solution1
2 2012-10-22 07:56:03

Serial code much slower than using only one thread in C?

Question

1 answers

solution1 2 2012-10-22 07:56:03

solution1
2 2012-10-22 07:56:03