OpenCL-我的數組太大怎么導致堆棧溢出？

Question

我是OpenCL的新手，並且正在使用C ++包裝器對其進行編程。 我有一個較舊的AMD卡（Radeon HD 5770），這可能是問題的原因，但我現在暫時要排除此卡。

我正在嘗試“處理”一個“圖像”，為此我偽造了一個400 x 400 pixel ^ 2作為一維整數數組。 因此，我的緩沖區大小應為4 * 400 * 400-大約640kb。 我認為這根本不大。

我認為一些相關數據：

每個工作組的最大工作項：256
每個工作組的最大工作項尺寸：（256、256、256），但我認為其中x * y * z <= 256。
最大內存分配大小：536,870,912（看起來像1/2 GB）
催化劑14.12
AMD SDK 3.0.0（測試版）
使用Visual Studio社區2013

一些代碼：

#include <cstdio>
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <iterator>
#include <stdio.h>
#include <streambuf>
#include <string>

#include <CL/cl.hpp>

using namespace System;
using namespace std;
#define IMG_WIDTH 400
#define IMG_HEIGHT 400

int main(array<System::String ^> ^args)
{
    vector<cl::Platform> all_platforms;
    cl::Platform::get(&all_platforms);

    cl::Platform default_platform = all_platforms[0];

    vector<cl::Device> all_devices;
    default_platform.getDevices(CL_DEVICE_TYPE_ALL, &all_devices);
    cl::Device default_device = all_devices[0];     

    cl::Context context({ default_device });

    std::ifstream file("kernels.cl");
    std::string kcode(std::istreambuf_iterator<char>(file),
                      (std::istreambuf_iterator<char>()));

    cl::Program::Sources sources(1,
         std::make_pair(kcode.c_str(), kcode.length() + 1));

    cl::Program program(context, sources);

    if (program.build({ default_device }) != CL_SUCCESS){
        cout << "Error building " << program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(default_device) << endl;
    exit(1);
    }

    int h_C[IMG_WIDTH * IMG_HEIGHT]; // initialize the array.
    cl::Buffer d_C(context, CL_MEM_READ_WRITE, sizeof(int) * IMG_WIDTH * IMG_HEIGHT); // create the device memory for this array.

    cl::CommandQueue queue(context, default_device, CL_QUEUE_PROFILING_ENABLE);

    cl::Kernel kernel_to_run(program, "get_row");   
    kernel_to_run.setArg(0, d_C);
    kernel_to_run.setArg(1, IMG_WIDTH);
    kernel_to_run.setArg(2, IMG_HEIGHT);

    cl::Event evt;
    queue.enqueueNDRangeKernel(kernel_to_run, cl::NullRange, cl::NDRange(IMG_WIDTH, IMG_HEIGHT), cl::NDRange(10, 10), NULL, &evt);
    queue.finish();

    /* I think the problem is here. If I comment it out, the program
       will run fine, but I need the device information back to the
       host, though!
    */
    queue.enqueueReadBuffer(d_C, CL_TRUE, 0, sizeof(int) * IMG_WIDTH * IMG_HEIGHT, h_C);

    unsigned long elapsed = (unsigned long)(evt.getProfilingInfo<CL_PROFILING_COMMAND_END>() -
    evt.getProfilingInfo<CL_PROFILING_COMMAND_START>());
std::cout << " result: " << elapsed / (float)10e6 << " ms";

    queue.flush();
    queue.finish();
    delete &d_C;
}

內核，什么也不做，只存儲每個“像素”所屬的全局行：

#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
__kernel void get_row(__global int *out, int width, int height){

    int r = get_global_id(1);
    int c = get_global_id(0);

    if ((r >= height) || (c >= width))
        return; 

    int gIdx = r * width + c;

    out[gIdx] = r;

}

我究竟做錯了什么？ 對於400 x 400，程序給我一個錯誤“進程由於堆棧溢出異常而終止”

我的“圖像”尺寸是否太大（僅400 x 400），無法滿足工作項目的總尺寸？
我選擇的工作組大小為100（10 x 10），所以，我想我將有1600個工作組，而400 x400。我認為工作組的數量沒有限制，即使對於較舊的設備也是如此，還是在那里？
也許我的主機代碼順序不正確。

在這方面的任何幫助表示贊賞。 我根本不希望有新的圖形卡。 我不想將圖像分成較小的矩形，然后將其分成工作組。

我在CUDA（在另一台計算機中）上執行了與上述相同的操作，圖像大於400 x 400，沒有問題。

Answer 1

您的變量h_C占用大量堆棧內存。 堆棧內存非常有限。 而不是使用類似棧的變量，

int h_C[IMG_WIDTH * IMG_HEIGHT];

使用std::vector類的東西動態分配它：

std::vector<int> h_C;
h_C.resize(IMG_WIDTH * IMG_HEIGHT);
...
queue.enqueueReadBuffer(d_C, CL_TRUE, 0, sizeof(int) * IMG_WIDTH * IMG_HEIGHT, h_C.data());

OpenCL-我的數組太大怎么導致堆棧溢出？

問題描述

1 個解決方案

解決方案1
4 2015-02-01 13:05:15

OpenCL-我的數組太大怎么導致堆棧溢出？

問題描述

1 個解決方案

解決方案1 4 2015-02-01 13:05:15

解決方案1
4 2015-02-01 13:05:15