简体   繁体   中英

OpenCL multiple GPU integral - segfault when changing global size from 32 to 64

I have created kernel function that computes integral from certain range and adds result to variable (one variable per GPU) and in host I add them all and I have a result of integral (in this case x^2dx) and for range 0-8 my result is 170,666... which is true. I was using global work size 1, 2, 4, 8, 16, 32 and it worked for all of them but for some reason when I change GWS to 64 I have segmentation fault. I have 1 platform (contains 8 GPU cards) each device have its own queue, context, kernel.

Here are few lines from my code:

Im creating 3 buffers which I passes later to kernel (third one is for reading result).

cl_mem bufferA[deviceNumber];
cl_mem bufferB[deviceNumber];
cl_mem bufferC[deviceNumber];
for(int i = 0; i< deviceNumber; i++){
    bufferA[i] = clCreateBuffer(context[i], CL_MEM_READ_WRITE , sizeof(float) * global_size, NULL, &error);
    bufferB[i] = clCreateBuffer(context[i], CL_MEM_READ_ONLY , sizeof(float) * global_size, NULL, &error);
    bufferC[i] = clCreateBuffer(context[i], CL_MEM_WRITE_ONLY, sizeof(float) * global_size, NULL, &error);
}

later after creating and building program i set kernel args.

    for(int i = 0; i< deviceNumber; i++){
        error = clSetKernelArg(kernel[i], 0, sizeof(cl_mem), (void*)&bufferA[i]);
        error = clSetKernelArg(kernel[i], 1, sizeof(cl_mem), (void*)&bufferB[i]);
        error = clSetKernelArg(kernel[i], 2, sizeof(cl_mem), (void*)&bufferC[i]);
        error = clSetKernelArg(kernel[i], 3, sizeof(cl_int), (void*)&global_size);
}

and enqueuing writeBuffers

for(int i = 0; i< deviceNumber; i++){
    error = clEnqueueWriteBuffer(commandQueue[i], bufferA[i], CL_FALSE, 0, sizeof(float) * global_size, a, 0, NULL, NULL);
    error = clEnqueueWriteBuffer(commandQueue[i], bufferB[i], CL_FALSE, 0, sizeof(float) * global_size, &b[i], 0, NULL, NULL);
}

enqueuing kernels to do their jobs.

for(int i = 0; i< deviceNumber; i++){
    error = clEnqueueNDRangeKernel(commandQueue[i], kernel[i], 1, NULL, &global_size, &localWorkSize, 0, NULL, NULL);
}

and finally place where segfault occurs:

for(int i = 0; i< deviceNumber; i++){
    std::cout<<"clEnqueueReadBuffer: "<<error<<std::endl;
    error = clEnqueueReadBuffer(commandQueue[i], bufferC[i], CL_TRUE, 0, sizeof(float) * global_size, &c[i], 0, NULL, NULL);
}

I am printing error codes everywhere and there are all 0 and last thing I see in output is that string just before clEnqueueReadBuffer so it crashes in first iteration in for loop.

Does anyone know what am I missing here?

found the fault!

sizeof(float) * global_size

it was ok for reading vector which size was equal to global_size but after reforging code to integral I totally forgot about that, if you read one variable per device you need only sizeof(type) nothing more. Hope it will help someone

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM