简体   繁体   中英

OpenCL crash on big 2d range

In my program, i need to run the kernel once on every item of the large 2d-array. The program works correctly for small ranges - up to around 50x50, sometimes up to 100x100.

For bigger datasets however, calling the kernel causes the video card driver to crash.

I have tested this program on two computers with different AMD cards, and they exhibit the exact same behaviour. Other, one-dimensional kernels work properly, even for huge datasets of ~10 000 x 10 000 items.

Also, removing the i variable from the matrix[i + (N + 1) * j] expression causes the kernel to work without errors.

Am i setting the range incorrectly, making a mistake in the kernel, or maybe the problem lies elsewhere?

enqueued range:

cl::EnqueueArgs args(queue,cl::NDRange(offset, offset+1),cl::NDRange(N+1, N),cl::NullRange);

kernel:

void kernel sub(global float* matrix, global const float* vec, int N, int offset) {
  int i = get_global_id(0);
  int j = get_global_id(1);         
  matrix[i + (N + 1) * j] -= matrix[i + (N + 1) * offset] * vec[j]; 
}

One of possible reasons - if your kernel is running for too long, driver may drop it. Dice up problem area into smaller blocks.

Consider this, for a 100x100 input array you will use N=100, hence the maximum value of i in your kernel will be 100 because of the N+1 used in the enqueue args, while the maximum for j will be 99. I have assumed that offset = 0. Therefore i + (N + 1) * j = 100 + 101*99 = 10099 which is outside of your 2D array.

When offset = 1, the minimums for i and j will be 1 and 2 respectively, while the maximums will be 101 and 100. Therefore i + (N + 1) * j = 101 + 101*100 = 10201.

In my experience, GPUs are not very good at catching segmentation faults when accessing global memory. Your attempt at purposefully creating one may work on some cards sometimes but no guarantees.

The problem could be caused by local-work-size and global-work-size. It is important while using two dimensional arrays to properly calculate them. It could be that for big values your global_id(0) is bigger than you specified in clEnqueueNDRangeKernel().

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM