简体   繁体   中英

OpenCL instantiating local memory array: invalid pointer error in kernel

I'm trying to create 2 local arrays for a kernel to use. My goal is to copy a global input buffer into the first array (arr1), and instantiate the second array (arr2) so its elements can be accessed and set later.

My kernel looks like this:

__kernel void do_things (__global uchar* in, __global uchar* out, 
uint numIterations, __local uchar* arr1, __local uchar* arr2)
{
  size_t work_size = get_global_size(0) * get_global_size(1);

  event_t event;
  async_work_group_copy(arr1, in, work_size, event);
  wait_group_events(1, &event);

  int cIndex = (get_global_id(0) * get_global_size(1)) + get_global_id(1);
  arr2[cIndex] = 0;

  //Do other stuff later
}

In the C++ code I'm calling this from, I set the kernel arguments like this:

//Create input and output buffers
cl_mem inputBuffer = clCreateBuffer(context, CL_MEM_READ_ONLY |
    CL_MEM_COPY_HOST_PTR, myInputVector.size(), (void*) 
    myInputVector.data(), NULL);
cl_mem outputBuffer = clCreateBuffer(context, CL_MEM_WRITE_ONLY,
    myInputVector.size(), NULL, NULL);

//Set kernel arguments.
clSetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&inputBuffer));
clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&outputBuffer));
clSetKernelArg(kernel, 2, sizeof(cl_uint), &iterations));
clSetKernelArg(kernel, 3, sizeof(inputBuffer), NULL));
clSetKernelArg(kernel, 4, sizeof(inputBuffer), NULL));

Where myInputVector is a vector full of uchars.

Then, I enqueue it with a 2D work size, rows * cols big. myInputVector has a size of rows * cols.

//Execute the kernel
size_t global_work_size[2] = { rows, cols }; //2d work size
status = clEnqueueNDRangeKernel(commandQueue, kernel, 2, NULL,
    global_work_size, NULL, 0, NULL, NULL);

The problem is, I'm getting crashes when I run the kernel. Specifically, this line in the kernel:

arr2[cIndex] = 0;

is responsible for the crash (omitting it makes it so it doesn't crash anymore). The error reads:

*** glibc detected *** ./MyProgram: free(): invalid pointer: 0x0000000001a28fb0 ***

All I want is to be able to access arr2 alongside arr1. arr2 should be the same size as arr1. If that's the case, Why am I getting this bizarre error? Why is this an invalid pointer?

The issue is that you are allocating only sizeof(cl_mem) for your local buffers. And a cl_mem is simply a typedef of some sort of pointer type (therefore 4 to 8 bytes depending on your system).

What then happen in your kernel is that you are accessing beyond the size of the local buffer you allocated and the GPU launches a memory fault.

clSetKernelArg(kernel, 3, myInputVector.size(), NULL);
clSetKernelArg(kernel, 4, myInputVector.size(), NULL);

Should fix your problem. Also note that the size you are providing is the size in bytes so you would need to multiply by the sizeof of the vector element type (which is not clear from code).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM