简体   繁体   English

OpenCL版本的cudaMemcpyToSymbol&optimization

[英]OpenCL version of cudaMemcpyToSymbol & optimization

Can someone tell me OpenCl version of cudaMemcpyToSymbol for copying __constant to device and getting back to host? 有人能告诉我cradaMemcpyToSymbol的OpenCl版本是否将__constant复制到设备并返回主机?
Or usual clenquewritebuffer(...) will do the job ? 或者通常的clenquewritebuffer(...)会做这个工作吗?
Could not find much help in forum. 在论坛中找不到多少帮助。 Actually a few lines of demo will suffice. 实际上几行演示就足够了。

Also shall I expect same kind of optimization in opencl as that of CUDA using constant cache? 我还期望opencl中的同类优化与使用常量缓存的CUDA相同吗?

Thanks 谢谢

I have seen people use cudaMemcpyToSymbol() for setting up constants in the kernel and the compiler could take advantage of those constants when optimizing the code. 我见过人们使用cudaMemcpyToSymbol()来设置内核中的常量,编译器可以在优化代码时利用这些常量。 If one was to setup a memory buffer in openCL to pass such constants to the kernel then the compiler could not use them to optimize the code. 如果要在openCL中设置内存缓冲区以将这些常量传递给内核,那么编译器就无法使用它们来优化代码。

Instead the solution I found is to replace the cudaMemcpyToSymbol() with a print to a string that defines the symbol for the compiler. 相反,我找到的解决方案是将cudaMemcpyToSymbol()替换为一个字符串,用于定义编译器的符号。 The compiler can take definitions in the form of -D FOO=bar for setting the symbol FOO to the value bar . 编译器可以采用-D FOO=bar形式的定义,用于将符号FOO设置为值bar

Not sure about OpenCL.Net, but in plain OpenCL: yes, clenquewritebuffer is enough (just remember to create buffer with CL_MEM_READ_ONLY flag set). 不确定OpenCL.Net,但是在普通的OpenCL中:是的, clenquewritebuffer就足够了(只记得设置CL_MEM_READ_ONLY标志的缓冲区)。

Here is a demo from Nvidia GPU Computing SDK (OpenCL/src/oclQuasirandomGenerator/oclQuasirandomGenerator.cpp): 以下是Nvidia GPU Computing SDK (OpenCL / src / oclQuasirandomGenerator / oclQuasirandomGenerator.cpp)的演示:

c_Table[i] = clCreateBuffer(cxGPUContext, CL_MEM_READ_ONLY, QRNG_DIMENSIONS * QRNG_RESOLUTION * sizeof(unsigned int),     
                 NULL, &ciErr);
ciErr |= clEnqueueWriteBuffer(cqCommandQueue[i], c_Table[i], CL_TRUE, 0, 
            QRNG_DIMENSIONS * QRNG_RESOLUTION * sizeof(unsigned int), tableCPU, 0, NULL,  NULL);

Constant memory in CUDA and in OpenCL are exactly the same, and provide the same type of optimization. CUDA和OpenCL中的常量内存完全相同,并提供相同类型的优化。 That is, if you use nVidia GPU. 也就是说,如果你使用nVidia GPU。 On ATI GPUs, it should act similarly. 在ATI GPU上,它的行为应该类似。 And I doubt that constant memory would give you any benefit over global when run on CPU. 而且我怀疑在CPU上运行时,恒定内存会给你带来全局优势。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM