简体   繁体   中英

CUDA HOST - DEVICE synchrony

I have been working with CUDA for a bit and have a question about synchronization. Consider I have the following code. I am calling the same kernel in two different styles in main. I understand that the first time, the kernel will be triggered and immediately control comes back to host and it prints "gpu call". What about triggering kernel from the function? Will the control return immediately to host after the kernel call or will it wait for the kernel to complete and then return to host?

In other words, if I want to print "gpu call" only after the kernel is executed, does calling kernel from a function circumvent the use of streams and cudastreamsynchronize() for this particular case?

int initializedevvar(bool *s1, bool *s2, bool *s3, bool *s4){
initialize<<<1,1>>>(state1, state2, state3, state4);
return 0;
}
    __global__ void initialize(bool* Mcheckin, bool *Mcheckout, bool 
 *Scheckin, bool *Scheckout){
            Mcheckin[0] = true;
            Mcheckout[0] = true;
            Scheckin[0] = false;
            Scheckout[0] = false;
        }
void main(){
    bool *state1, *state2, *state3, *state4;

               cudaMalloc(&state1, sizeof(bool));
    cudaMalloc(&state2, sizeof(bool));
    cudaMalloc(&state3, sizeof(bool));
    cudaMalloc(&state4, sizeof(bool));
    initialize<<<1,1>>>(state1, state2, state3, state4);
    std::cout<<"gpu call"<<endl;
    ...
    ...
    auto dummy = initializedevvar(state1, state2, state3, state4);
    std::cout<<"gpu call"<<endl;
    cudaFree(state1);
    cudaFree(state2);
    cudaFree(state3);
    cudaFree(state4);}

The kernel launch is asynchronous regardless of whether it is called from main or from another function. Control is returned immediately to the host thread, before the kernel has begun executing, and the host thread will proceed with whatever host code follows the kernel launch.

Your two cases should behave exactly the same.

You cannot use a function call launch to "circumvent" the use of a synchronizing function (streams or othewise). If you only want to print "gpu call" after a kernel has completed , you will need a synchronizing function of some sort, before the print statement.

It seems like this should be something that is quite easy to test.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM