简体   繁体   中英

OpenCL variables or array in kernel cost memory?

I am trying to run the following code about OpenCL. In kernel function, I will define an array int arr[1000] = {0};

kernel void test()
{
    int arr[1000] = {0};
}

Then I will create N threads to run the kernel.

cl::CommandQueue cmdQueue;
cmdQueue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(N), cl::NullRange); // kernel here is the one running test()

My question is, since we know that OpenCL will parallel run the threads, does it mean that, the peak memory will be N * 1000 * sizeof(int) ?

This is not the way to OpenCL (yes, that's what I meant :).

The kernel function operates on kernel operands passed in from the host (CPU) - so you'd allocate your array on the host using clCreateBuffer and set the arg using clSetKernelArg . Your kernel does not declare/allocate the device memory, but simply receives it as an __global argument. Now when you run the kernel using clEnqueueNDRangeKernel , the OpenCL implementation will allocate 1000 ints and run a thread on each of those ints.

If, on the other hand you meant to allocate 1000 ints per work-item (device thread), your calculation is right (yes, they cost memory from the local pool) but it probably won't work. OpenCL work-items have access to only local memory (see here on how to check this for your device) which is severely limited.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM