串行代码比在C语言中仅使用一个线程慢得多？

Question

So, I was doing some benchmark tests with threads, and i wrote these pieces of code: 因此，我正在使用线程进行一些基准测试，并编写了以下代码：

resp_threadless[] and resp_threaded[] are global int arrays and their size is n; resp_threadless []和resp_threaded []是全局int数组，其大小为n；

int n = 100000;

void function() {
  for (long j = 0; j < n; ++j) {
    int count = 0;
    double x = vetor[j];
      while (x > 1.0) {
      x = sqrt(x);
      ++count;
    }
   resp_threadless[j] = count;
  }
}

DWORD WINAPI function_th( LPVOID lpParam ) {
for (long j = 0; j < n; ++j) {
    int count = 0;
    double x = vetor[j];
      while (x > 1.0) {
      x = sqrt(x);
      ++count;
    }
   resp_threadless[j] = count;
  }
}

I benchmarked the first function by just calling her: 我通过打电话给她来对第一个功能进行基准测试：

function();

And the second one like this: 第二个是这样的：

HANDLE hThreadArray[1];
DWORD dwThreads[1];
hThreadArray[0] = CreateThread(NULL, 0, function_th, NULL , 0, &(dwThreads[0]));
WaitForMultipleObjects(1, hThreadArray, TRUE, INFINITE);
CloseHandle(hThreadArray[0]);

Keep in mind that I know that calling multiple threads using function_th() will not parallelize it, this is just a test because i was having really strange results, so I decided to see what would happen with one thread and one function using the SAME code. 请记住，我知道使用function_th（）调用多个线程不会并行化它，这只是一个测试，因为我的结果确实很奇怪，所以我决定看看使用SAME代码的一个线程和一个函数会发生什么。

I tested this in a Intel Atom N270, and windows XP with NUMPROC = 1. 我在Intel Atom N270和NUMPROC = 1的Windows XP上进行了测试。

Results: Serial code: 1485 ms One Thread: 425 ms 结果：序列号：1485 ms一个线程：425 ms

I've had similar results using multiprocessor machines, and even with code using semaphores to parallelize the work done by the threads. 使用多处理器机器，甚至使用信号量来并行化线程完成的工作的代码，我都有类似的结果。

Does anyone has any idea of what could be happening? 有谁知道会发生什么事吗？

EDIT 编辑

Inverting the order, running multiple times each one, etc... -> No change 颠倒顺序，一次运行多次，等等...->不变

Higher N -> Thread one is proportionally even faster N更高->线程一按比例甚至更快

Using QueryPerformanceCounter() -> No change 使用QueryPerformanceCounter（）->不变

Thread Creation Overhead -> Should make the threaded even one slower, not faster 线程创建开销->应该使线程更慢，而不是更快

Original code: http://pastebin.com/tgmp5p1G 原始代码： http ： //pastebin.com/tgmp5p1G

Answer 1

It's a cache hit matter. 这是cache hit问题。 I suspect you did the benchmark in the order you described it in your question. 我怀疑您按照问题描述的顺序进行了基准测试。 The function was called first and the thread was called after. 首先调用该函数，然后调用该线程。 When you benchmark this in more detail, you will observe the reason: Data (sqrt) is availabel in cache, thus the code will execute much faster. 当您对此进行更详细的基准测试时，您将观察到以下原因：数据（sqrt）在高速缓存中可用，因此代码执行得更快。 Test to proove: 测试以证明：

Run the function() twice or even more often before calling the thread. 在调用线程之前，请运行function()两次，甚至更多次。 The second call to function will give the quicker result already. 第二次调用function将已经给出更快的结果。
Call the thread before the function and your result will show the opposite. 在函数之前调用线程，结果将相反。 The function will show the better result. 该功能将显示更好的结果。

Reason: All of the sqrt calculation (or at least lots of them) are available in cache and don't have to be recalculated. 原因：所有sqrt计算（或至少很多）都可以在缓存中使用，而不必重新计算。 That's a lot faster. 那快很多。

串行代码比在C语言中仅使用一个线程慢得多？

问题描述

1 个解决方案

解决方案1
2 2012-10-22 07:56:03

串行代码比在C语言中仅使用一个线程慢得多？

问题描述

1 个解决方案

解决方案1 2 2012-10-22 07:56:03

解决方案1
2 2012-10-22 07:56:03