OpenCL C ++-线程64之后数组的内存管理错误

Question

I have run into this very weird problem while working with openCL C++. 在使用openCL C ++时，我遇到了这个非常奇怪的问题。 The problem is that I have 100 threads that are accessing one element each of a 100 size array. 问题是我有100个线程正在访问100个大小数组中的每个元素。 From 0 to 63, there is no problem and each thread is computing and updating value of the array's element properly. 从0到63，没有问题，每个线程都在正确计算和更新数组元素的值。 But when it gets to thread 64, it messes up and updates the values with some other values... 但是当它进入线程64时，它搞砸了，并用其他一些值更新了这些值...

Here is how I call the kernel: 这是我所谓的内核：

kernelGA(cl::EnqueueArgs(queue[iter],
                        cl::NDRange(200 / numberOfDevices)),
                        d_value,
                        d_doubleParameters,
                        buf_half_population, and so on...)

At the kernel side, I am accessing each thread using: 在内核方面，我正在使用以下方法访问每个线程：

__kernel void kernelGA (__global double * value,
                        __global double * doubleParameters,
                        __global double * population,
                        __global double * scores, and so on...)

int idx = get_global_id(0); // This gives me 100 threads for each device. (I have two devices)
int size_a = 50;
double tempValue[size_a];

// Copying the global "value" into local array so each thread has its own copy.
for (int i = 0; i < size_a; i++) {
    tempValue[i] = value[i];
}

At this point, each thread now has its own tempValue[] array with the same values. 此时，每个线程现在都有其自己的tempValue []数组，它们具有相同的值。 Then I apply some computations and formulas on the values of tempValue[] array for each thread... 然后我对每个线程的tempValue []数组的值应用一些计算和公式...

// Applying some computations on tempValue and changing the values for each  copy of tempValue for each thread.
tempValue[i] = some calculations for each thread...

After this, I access each element of the tempValue[] array for each thread and put them back continuously in a bigger array of size (number of threads * size_a). 此后，我为每个线程访问tempValue []数组的每个元素，并将它们连续放回到更大的大小数组（线程数* size_a）中。 Keeping in mind that the indexing for an array goes like: 0,1, 2, 3,... and so on... 请记住，数组的索引如下：0,1，2，3，...依此类推...

for (int i = 0; i < size_a; i++) {
    totalArray[(idx * size_a) + i] = tempvalue[i];
}

So when I get the answers of totalArray outside the kernel and print them, the first 64 (from 0-63) threads have properly put their values in the totalArray[]. 因此，当我在内核外部获取totalArray的答案并打印出来时，前64个线程（从0-63）正确地将它们的值放入totalArray []中。 But 64 onwards, the indexing is messed up. 但是从64开始，索引混乱了。 I mean not exactly the indexing, because I printed out only the indexes, and the indexes are properly accesses for all threads. 我的意思不是完全索引，因为我只打印出索引，并且索引是所有线程的正确访问。 But the values seem to be messed up... 但是价值观似乎一团糟...

For example: the value of the 3rd, 4th, 5th and 6th elements of thread 0-63 is 50, 60, 70 and 80 respectively. 例如：线程0-63的第3，第4，第5和第6个元素的值分别为50、60、70和80。 But for thread 64 onwards, the values of the the 3rd, 4th, 5th and 6th elements are 80, 90, 100, 110. As if the values have been shifted a few elements in the backward direction. 但是对于线程64之前的版本，第3，第4，第5和第6个元素的值为80、90、100、110。就好像这些值已向后移动了几个元素。 Why? 为什么？ What is going on here? 这里发生了什么？

Answer 1

There is a problem if multiple devices working on same array, 如果多个设备在同一阵列上工作，则会出现问题，

You put 你把

cl::NDRange(200 / numberOfDevices)

as range 作为范围

but you are not putting 但是你不放

cl::NDRange((200 / numberOfDevices)*deviceIndex)

as offset for each device. 作为每个设备的偏移量。

All devices trying to write to same position instead of neighbouring groups. 所有试图写入同一位置而不是相邻组的设备。

Also you are not checking if total threads number less than array length in kernel so some threads may try write out of bounds. 另外，您不检查内核中的总线程数是否少于数组长度，因此某些线程可能会尝试写越界。

Answer 2

So I found the solution for my problem: 因此，我找到了解决问题的方法：

The problem was that even though each thread had its own copy of the value[] array stored in tempValue[] array: 问题在于，即使每个线程都有存储在tempValue[]数组中的value[]数组的副本：

// Copying the global "value" into local array so each thread has its own copy.
for (int i = 0; i < size_a; i++) {
    tempValue[i] = value[i];
}

The values in the array were being messed up after thread 64. So what I did was I created a larger array of the values outside in the host code ( sizeOf(value) * 100 ) and then copied the first part of the array to the rest 99 parts and I sent it to the device . 线程64之后，数组中的值被弄乱了。所以我要做的是在主机代码中创建了一个更大的值外部数组（ sizeOf(value) * 100 ），然后将数组的第一部分复制到剩下99个零件，我寄给了设备。 And then I made each thread access its own part of the value[] array using indexing. 然后，我使每个线程都使用索引访问它自己的value []数组部分。

It solved the problem! 它解决了问题！

OpenCL C ++-线程64之后数组的内存管理错误

问题描述

2 个解决方案

解决方案1
0 2016-10-05 18:00:48

解决方案2
0 已采纳 2016-10-07 11:00:04

OpenCL C ++-线程64之后数组的内存管理错误

问题描述

2 个解决方案

解决方案1 0 2016-10-05 18:00:48

解决方案2 0 已采纳 2016-10-07 11:00:04

解决方案1
0 2016-10-05 18:00:48

解决方案2
0 已采纳 2016-10-07 11:00:04