简体   繁体   English

CUDA并行线程

[英]CUDA parallel threads

I am encountering the following problem when running a CUDA program: 我在运行CUDA程序时遇到以下问题:

  1. I invoke a simple kernel with a single block that has 2 threads 我用一个包含2个线程的块调用一个简单的内核

    CUDAkernel<<<1,2>>>

  2. Inside the kernel I do the following: 在内核中我执行以下操作:

    int i = threadIdx.x; if (i==0){ waitabit(); }

    if (i==1){ waitabit(); }

So, both kernel threads invoke the same function waitabit() which pretty much wastes some clock cycles: 因此,两个内核线程都调用相同的函数waitabit() ,这几乎浪费了一些时钟周期:

__device__ void waitabit(){
    clock_t start = clock();
        clock_t now;
        for (;;) {
        now = clock();
        clock_t cycles = now > start ? now - start : now + (0xffffffff - start);
            if (cycles >= 10000000  ) 
            {break;}
        }           
}

Now the problem: the function waitabit() delays the the thread by 0.008 seconds. 现在问题是:函数waitabit()将线程延迟0.008秒。 I naturally assumed that the threads run in parallel, so both of them will by stalled in parallel by 0.008 seconds (roughly) and the whole kernel's delay will be roughly 0.008 seconds. 我自然地假设线程并行运行,因此它们都会 0.008秒(大致)内并行停滞整个内核的延迟大约为0.008秒。

However, this is not the case. 然而,这种情况并非如此。 The kernels executes them serially and the delay is 0.016, ie 2*0.008 内核连续执行它们,延迟为0.016,即2 * 0.008

Is the parallelism done incorrectly? 并行性是否错误?

thanks in advance! 提前致谢!

This is a SIMT machine. 这是一台SIMT机器。 Only a single instruction is processed by a warp at any given time. 在任何给定时间,只有一条指令由warp处理。 In the event of control flow divergence the processing of the if path and the else path are handled sequentially, not in parallel. 在控制流分歧的情况下,if路径和else路径的处理是顺序处理的,而不是并行处理的。 When all threads of the warp reach your first if statement, thread 0 processes the if path while all other threads do nothing . 当warp的所有线程到达你的第一个if语句时,线程0处理if路径,而所有其他线程什么都不做 The warp then resynchronizes at the end of that if construct and begin processing in parallel. 然后warp在结束时重新同步,如果构造并开始并行处理。 Then they hit the second if statement and only thread 1 continues while the others wait. 然后他们点击第二个if语句,只有线程1继续而其他人等待。 Then they resychronize again at the end of the second if construct and begin processing in lockstep. 然后,它们在第二个if结构的末尾再次重新同步,并开始以锁步方式处理。

So the net effect for your example is that the two if statements are processed sequentially. 因此,您的示例的净效果是按顺序处理两个if语句。 This is expected. 这是预料之中的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM