简体   繁体   中英

Running zero blocks in cuda

I have a loop like this:

while ( ... ) {
    ...
    kernel<<<blocks, threads>>>( ... );
}

and in some iterations blocks or threads have value 0 . When I use this my code runs. My question is if this is considered bad practice, and if there are any other bad side effects.

It's bad practice because it will interfere with proper CUDA error checking .

If you do proper error checking, your kernel launches that have all-zero values for block or grid dimensions will throw an error.

It's preferable to write error free programs for a variety of reasons.

Instead, include a test for these cases and skip the kernel launch when your dimensions are zero. The small overhead in C-code to do this will be more than offset by the reduced API overhead by not making the spurious kernel launch request.

I have tried zero block kernel invocation by simply writing following empty kernel.

File:

#include<stdio.h>

__global__ void fg()
{

} 
int main()
{   
 fg<<<0,1>>>();
}

What I noticed was the only side effect was in terms of the time required for execution.

Run time :

real 0m0.242s, user 0m0.004s, sys 0m0.148s.

When I run the same file with kernel invocation commented the side effect of overhead in time decreases.

Run time:

real 0m0.003s, user 0m0.000s, sys 0m0.000s.

This side effect arises due to the kernel invocation over head for zero blocks.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM