简体   繁体   English

在CUDA中运行零块

[英]Running zero blocks in cuda

I have a loop like this: 我有这样一个循环:

while ( ... ) {
    ...
    kernel<<<blocks, threads>>>( ... );
}

and in some iterations blocks or threads have value 0 . 在某些迭代中, blocksthreads值为0 When I use this my code runs. 当我使用此代码时,我的代码将运行。 My question is if this is considered bad practice, and if there are any other bad side effects. 我的问题是这是否被视为不良做法,以及是否还有其他不良副作用。

It's bad practice because it will interfere with proper CUDA error checking . 这是不好的做法,因为它会干扰正确的CUDA错误检查

If you do proper error checking, your kernel launches that have all-zero values for block or grid dimensions will throw an error. 如果进行正确的错误检查,则对于块或网格尺寸具有全零值的内核启动将引发错误。

It's preferable to write error free programs for a variety of reasons. 出于各种原因,最好编写无错误的程序。

Instead, include a test for these cases and skip the kernel launch when your dimensions are zero. 相反,请针对这些情况进行测试,并在尺寸为零时跳过内核启动。 The small overhead in C-code to do this will be more than offset by the reduced API overhead by not making the spurious kernel launch request. 通过不发出虚假的内核启动请求,用C代码完成的少量开销将被减少的API开销所抵消。

I have tried zero block kernel invocation by simply writing following empty kernel. 我已经尝试通过简单地编写以下空内核来尝试零块内核调用。

File: 文件:

#include<stdio.h>

__global__ void fg()
{

} 
int main()
{   
 fg<<<0,1>>>();
}

What I noticed was the only side effect was in terms of the time required for execution. 我注意到的唯一副作用是执行所需的时间。

Run time : 运行 :

real 0m0.242s, user 0m0.004s, sys 0m0.148s. 实数0m0.242s,用户0m0.004s,sys 0m0.148s。

When I run the same file with kernel invocation commented the side effect of overhead in time decreases. 当我使用内核调用运行同一文件时,注释了时间开销的副作用减少了。

Run time: 运行:

real 0m0.003s, user 0m0.000s, sys 0m0.000s. 真实0m0.003s,用户0m0.000s,sys 0m0.000s。

This side effect arises due to the kernel invocation over head for zero blocks. 产生这种副作用的原因是内核调用了零个块。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM