[英]pthread_join is being a bottleneck
I have an application where pthread_join
is being the bottleneck. 我有一个应用程序,其中
pthread_join
是瓶颈。 I need help to resolve this problem. 我需要帮助来解决这个问题。
void *calc_corr(void *t) {
begin = clock();
// do work
end = clock();
duration = (double) (1000*((double)end - (double)begin)/CLOCKS_PER_SEC);
cout << "Time is "<<duration<<"\t"<<h<<endl;
pthread_exit(NULL);
}
int main() {
start_t = clock();
for (ii=0; ii<16; ii++)
pthread_create(&threads.p[ii], NULL, &calc_corr, (void *)ii);
for (i=0; i<16; i++)
pthread_join(threads.p[15-i], NULL);
stop_t = clock();
duration2 = (double) (1000*((double)stop_t - (double)start_t)/CLOCKS_PER_SEC);
cout << "\n Time is "<<duration2<<"\t"<<endl;
return 0;
}
The time printed in the thread function is in the range of 40ms - 60ms where as the time printed in the main function is in the 650ms - 670ms . 螺纹功能中打印的时间范围为40ms - 60ms ,主要功能中打印的时间为650ms - 670ms 。 The irony is, my serial code runs in 650ms - 670ms time.
具有讽刺意味的是,我的串行代码运行时间为650毫秒 - 670毫秒 。 what can I do to reduce the time taken by
pthread_join
? 我该怎么做才能减少
pthread_join
所花费的时间?
Thanks in advance! 提前致谢!
On Linux, clock()
measures the combined CPU time. 在Linux上,
clock()
测量组合的CPU时间。 It does not measure the wall time. 它不测量墙壁时间。
This is explains why you get ~640 ms = 16 * 40ms
. 这就解释了为什么你得到
~640 ms = 16 * 40ms
。 (as pointed out in the comments) (正如评论中所指出)
To measure wall time, you should be using something like: 要测量墙壁时间,您应该使用以下内容:
By creating some threads you are adding an overhead to your system: Creation time, scheduling time. 通过创建一些线程,您将为系统增加开销:创建时间,调度时间。 Creating a thread require allocating the stack, etc;
创建线程需要分配堆栈等; scheduling means more context switching.
调度意味着更多上下文切换 Also,
pthread_join suspends execution of the calling thread until the target thread terminates
. 此外,
pthread_join suspends execution of the calling thread until the target thread terminates
。 Which means you want for thread 1 to finish, when he does you are rescheduled as quick as possible but not instantly, then you wait for thread 2, etc... 这意味着您希望线程1完成,当他完成时,您可以尽快重新安排,但不能立即重新安排,然后等待线程2等等...
Now your computer has few cores, like one or 2, and you are creating 16 threads. 现在你的计算机有几个内核,比如一个或两个,你创建了16个线程。 At best 2 threads of your program will run at the same time and just by adding their clock measurements you have something around
400 ms
. 最多程序的2个线程将同时运行,只需添加时钟测量值就可以得到大约
400 ms
。
Again It depends on lot of things, so I quickly flown over what is happening. 这又取决于很多事情,所以我很快就会发生什么事情。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.