So, I was doing some benchmark tests with threads, and i wrote these pieces of code:
resp_threadless[] and resp_threaded[] are global int arrays and their size is n;
int n = 100000;
void function() {
for (long j = 0; j < n; ++j) {
int count = 0;
double x = vetor[j];
while (x > 1.0) {
x = sqrt(x);
++count;
}
resp_threadless[j] = count;
}
}
DWORD WINAPI function_th( LPVOID lpParam ) {
for (long j = 0; j < n; ++j) {
int count = 0;
double x = vetor[j];
while (x > 1.0) {
x = sqrt(x);
++count;
}
resp_threadless[j] = count;
}
}
I benchmarked the first function by just calling her:
function();
And the second one like this:
HANDLE hThreadArray[1];
DWORD dwThreads[1];
hThreadArray[0] = CreateThread(NULL, 0, function_th, NULL , 0, &(dwThreads[0]));
WaitForMultipleObjects(1, hThreadArray, TRUE, INFINITE);
CloseHandle(hThreadArray[0]);
Keep in mind that I know that calling multiple threads using function_th() will not parallelize it, this is just a test because i was having really strange results, so I decided to see what would happen with one thread and one function using the SAME code.
I tested this in a Intel Atom N270, and windows XP with NUMPROC = 1.
Results: Serial code: 1485 ms One Thread: 425 ms
I've had similar results using multiprocessor machines, and even with code using semaphores to parallelize the work done by the threads.
Does anyone has any idea of what could be happening?
EDIT
Inverting the order, running multiple times each one, etc... -> No change
Higher N -> Thread one is proportionally even faster
Using QueryPerformanceCounter() -> No change
Thread Creation Overhead -> Should make the threaded even one slower, not faster
Original code: http://pastebin.com/tgmp5p1G
It's a cache hit
matter. I suspect you did the benchmark in the order you described it in your question. The function was called first and the thread was called after. When you benchmark this in more detail, you will observe the reason: Data (sqrt) is availabel in cache, thus the code will execute much faster. Test to proove:
function()
twice or even more often before calling the thread. The second call to function will give the quicker result already. Reason: All of the sqrt calculation (or at least lots of them) are available in cache and don't have to be recalculated. That's a lot faster.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.