CUDA主机-设备同步

Question

I have been working with CUDA for a bit and have a question about synchronization. 我已经在CUDA上工作了一段时间，并对同步有疑问。 Consider I have the following code. 考虑我有以下代码。 I am calling the same kernel in two different styles in main. 我在main中以两种不同的样式调用同一内核。 I understand that the first time, the kernel will be triggered and immediately control comes back to host and it prints "gpu call". 我知道，第一次，内核将被触发，并且控制权立即返回到主机，并显示“ gpu调用”。 What about triggering kernel from the function? 从函数触发内核呢？ Will the control return immediately to host after the kernel call or will it wait for the kernel to complete and then return to host? 内核调用之后，控件将立即返回主机还是等待内核完成然后返回主机？

In other words, if I want to print "gpu call" only after the kernel is executed, does calling kernel from a function circumvent the use of streams and cudastreamsynchronize() for this particular case? 换句话说，如果我只想在内核执行后才打印“ gpu调用”，那么在这种情况下，从函数调用内核是否可以避免使用流和cudastreamsynchronize（）？

int initializedevvar(bool *s1, bool *s2, bool *s3, bool *s4){
initialize<<<1,1>>>(state1, state2, state3, state4);
return 0;
}
    __global__ void initialize(bool* Mcheckin, bool *Mcheckout, bool 
 *Scheckin, bool *Scheckout){
            Mcheckin[0] = true;
            Mcheckout[0] = true;
            Scheckin[0] = false;
            Scheckout[0] = false;
        }
void main(){
    bool *state1, *state2, *state3, *state4;

               cudaMalloc(&state1, sizeof(bool));
    cudaMalloc(&state2, sizeof(bool));
    cudaMalloc(&state3, sizeof(bool));
    cudaMalloc(&state4, sizeof(bool));
    initialize<<<1,1>>>(state1, state2, state3, state4);
    std::cout<<"gpu call"<<endl;
    ...
    ...
    auto dummy = initializedevvar(state1, state2, state3, state4);
    std::cout<<"gpu call"<<endl;
    cudaFree(state1);
    cudaFree(state2);
    cudaFree(state3);
    cudaFree(state4);}

Answer 1

The kernel launch is asynchronous regardless of whether it is called from main or from another function. 内核启动是异步的，无论它是从main调用还是从另一个函数调用。 Control is returned immediately to the host thread, before the kernel has begun executing, and the host thread will proceed with whatever host code follows the kernel launch. 在内核开始执行之前，控制权立即返回给宿主线程，并且宿主线程将在内核启动后继续执行任何宿主代码。

Your two cases should behave exactly the same. 您的两种情况的行为应完全相同。

You cannot use a function call launch to "circumvent" the use of a synchronizing function (streams or othewise). 您不能使用函数调用启动来“规避”使用同步函数（流或其他方法）。 If you only want to print "gpu call" after a kernel has completed , you will need a synchronizing function of some sort, before the print statement. 如果只希望在内核完成后打印“ gpu调用”，则在print语句之前需要某种同步功能。

It seems like this should be something that is quite easy to test. 看起来这应该很容易测试。

CUDA主机-设备同步

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-01-06 00:41:25

CUDA主机-设备同步

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-01-06 00:41:25

解决方案1
2 已采纳 2018-01-06 00:41:25