简体   繁体   English

使用clock()分析C ++线程

[英]Profiling C++ threads with clock()

I am trying to measure how gcc threads perform on my system. 我试图测量gcc线程在我的系统上的执行情况。 I've written some very simple measurement code which is something like this... 我写了一些非常简单的测量代码,就像这样......

start = clock();
for(int i=0; i < thread_iters; i++) {
  pthread_mutex_lock(dataMutex);
  data++;
  pthread_mutex_unlock(dataMutex);
}
end = clock();

I do the usual subtract and div by CLOCKS_PER_SEC to get an elapsed time of about 2 seconds for 100000000 iterations. 我通常使用CLOCKS_PER_SEC减去div和div来获得1000000次迭代的大约2秒的经过时间。 I then change the profiling code slightly so I am measuring the individual time for each mutex_lock/unlock call. 然后我稍微更改了配置文件代码,以便我测量每个mutex_lock / unlock调用的单独时间。

for(int i=0; i < thread_iters; i++) {
  start1 = clock();
  pthread_mutex_lock(dataMutex);
  end1 = clock();
  lock_time+=(end1-start1);

  data++;

  start2 = clock();
  pthread_mutex_unlock(dataMutex);
  end2 = clock();
  unlock_time+=(end2-start2)
}

The times I get for the same number of iterations are lock: ~27 seconds unlock: ~27 seconds 我获得相同迭代次数的时间是锁定:~27秒解锁:~27秒

I get why the total time for the program increases, more timer calls in the loop. 我明白了为什么程序的总时间增加,循环中的更多计时器调用。 But the time for the system calls should still add up to less than 2 seconds. 但是系统调用的时间仍然不到2秒。 Can someone help me figure out where I went wrong? 有人能帮我弄明白我哪里出错了吗? Thanks! 谢谢!

The clock calls also measure the time it takes to call clock and return from it. clock调用还可以测量调用clock并从中返回所需的时间。 This introduces a bias into the measurement. 这在测量中引入了偏差。 Ie somewhere deep inside the clock function it takes a sample. 即在clock功能深处的某个地方需要一个样本。 But then before running your code, it has to return from deep inside clock . 但是在运行代码之前,它必须从深度内部clock返回。 And then when you take the end measurement, before that time sample can be taken, clock has to be called and control has to pass somewhere deep inside that function where it actually obtains the time. 然后当你进行结束测量时,在那个时间采样之前必须调用clock并且控制必须通过该功能深处的某个地方实际获得时间。 So you're including all that overhead as part of the measurement. 因此,您将所有开销作为测量的一部分。

You must find out how much time elapses between consecutive clock calls (by taking some samples over many pairs of clock calls to get an accurate average). 您必须知道连续clock调用之间经过了多长时间(通过对多对clock调用进行一些采样以获得准确的平均值)。 That gives you a baseline bias: how much time does it take to execute nothing at all between two clock samples. 这会给你一个基线偏差:在两个时钟样本之间执行任何操作需要多长时间。 You then carefully subtract your bias from the measurements. 然后,您仔细地从测量值中减去偏差。

But calls to clock can disturb the performance so that you're not getting an accurate answer. 但是对clock调用可能会影响性能,因此您无法获得准确的答案。 Calls to the kernel to get the clock are disturbing your L1 cache and instruction cache. 调用内核来获取时钟会干扰L1缓存和指令缓存。 For fine grained measurements like this, it is better to drop down to inline assembly and read a cycle counting register from the CPU. 对于像这样的细粒度测量,最好下拉到内联汇编并从CPU读取循环计数寄存器。

clock is best used as you have it in your first example: take samples around something that executes for many iterations, and then divide by the number of iterations to estimate the single-iteration time. 在第一个示例中,最好使用clock :在多次迭代执行的事物周围取样,然后除以迭代次数来估计单次迭代时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM