OpenGL 4.0 GPU绘图功能？

Question

In Wikipedia and other sources' description of OpenGL 4.0 I read about this feature: 在Wikipedia和其他来源对OpenGL 4.0的描述中，我了解了此功能：

Drawing of data generated by OpenGL or external APIs such as OpenCL, without CPU intervention. 由OpenGL或OpenAPI等外部API生成的数据绘图，无需CPU干预。

What is this referring to? 这指的是什么？

Edit : 编辑：

Seems like this must be referring to Draw_Indirect which I believe somehow extends the draw phase to include feedback from shader programs or programs from interop (OpenCL/CUDA basically) It looks as if there are a few caveats and tricks to getting the calls to keep staying on the GPU for any extended amount of time past the second run but it should be possible. 似乎必须引用Draw_Indirect，我认为它会以某种方式扩展了绘图阶段，以包括来自着色器程序或互操作程序的反馈（基本上是OpenCL / CUDA）。似乎有一些警告和窍门让调用保持不变在GPU上进行第二次运行之后的任何延长时间，但应该可以。

If anyone can provide any more info on using draw commands without CPU or can describe draw indirect better, please feel free to do so. 如果任何人都可以提供有关在不使用CPU的情况下使用绘制命令的更多信息，或者可以更好地描述绘制间接方法，请随时这样做。 It will be greatly appreciated. 将不胜感激。

Answer 1

I believe that you may be refering to GL_ARB_draw_indirect functionality that allows OpenGL to source the DrawArrays or DrawElements parameters from a GPU buffer object, that can be filled by OpenGL or OpenCL. 我相信您可能在引用GL_ARB_draw_indirect功能，该功能允许OpenGL从GPU缓冲区对象中获取DrawArrays或DrawElements参数，该对象可以由OpenGL或OpenCL填充。

If I'm not mistaken, it's included in core OpenGL 4. 如果我没记错的话，它包含在核心OpenGL 4中。

Answer 2

I haven't figured out how particularly OpenGL 4.0 makes this feature work, since it has existed before as well as far as I have understood. 我还没有弄清楚OpenGL 4.0如何使此功能有效，因为据我所知它已经存在。 I'm not sure if this answers your question, but I'll tell what I know about the subject anyway. 我不确定这是否能回答您的问题，但是我还是会告诉我有关该主题的知识。

It refers to a situation where some other library than OpenGL, such as OpenCL or CUDA, produces some data directly into the memory of the graphics card, and then OpenGL continues from where the other library left, and uses that data as 它是指一种情况，其中除OpenGL之外的其他某些库（例如OpenCL或CUDA）直接将一些数据生成到图形卡的内存中，然后OpenGL从另一个库的剩余位置继续，并将这些数据用作

pixel buffer object (PBO) when they want to draw the data to the screen as it is 像素缓冲区对象（PBO）当他们想要将数据原样绘制到屏幕上时
texture when they want to use the graphics data as a part of some other scene 他们想要将图形数据用作其他场景的一部分时的纹理
vertex buffer object (VBO) when they want to use the produced data as some arbitrary attribute input for vertex shader. 顶点缓冲对象（VBO）想要将生成的数据用作顶点着色器的任意属性输入时。 (one example of this might be a particle system which is simulated with CUDA and rendered with OpenGL) （一个示例可能是使用CUDA模拟并使用OpenGL渲染的粒子系统）

In a situation like this, it's a very good idea to keep the data in the graphics card all the time and not copy it around, especially not copy it through CPU, because the PCIe bus is very slow when compared to the memory bus of the graphics card. 在这种情况下，最好一直将数据保留在图形卡中，而不要复制数据，特别是不要通过CPU复制数据，因为与PC的内存总线相比，PCIe总线的速度非常慢图形卡。

Here's some sample code to do the trick with CUDA and OpenGL for VBOs and PBOs: 以下是一些示例代码，可用于VBO和PBO的CUDA和OpenGL：

// in the beginning
glGenBuffers(&id, 1);

// for every frame
cudaGLRegisterBufferObject(id);
CUdeviceptr ptr;
cudaGLMapBufferObject(&ptr, id);
// <launch kernel here>
cudaGLUnmapBufferObject(id);
// <now use the buffer "id" with OpenGL>
cudaGLUnregisterBufferObject(id);

And here's how you can load the data into a texture: 这是将数据加载到纹理中的方法：

glBindBuffer(GL_PIXEL_UNPACK_BUFFER, id);
glBindTexture(GL_TEXTURE_2D, your_tex_id);
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 256, 256, GL_RGBA, GL_UNSIGNED_BYTE, 0);

Also note that if you use some more unusual format instead of GL_RGBA it might be slower because it has to convert all the values. 另请注意，如果您使用一些更特殊的格式而不是GL_RGBA，则它可能会更慢，因为它必须转换所有值。

I don't know OpenCL but the idea is the same. 我不知道OpenCL，但是想法是一样的。 Only function names are different. 仅函数名称不同。

Another way to do the same thing is what is called host pinned memory . 做同一件事的另一种方法是所谓的主机固定内存 。 In that approach you map some CPU memory address range to the graphics card memory. 通过这种方法，您可以将一些CPU内存地址范围映射到图形卡内存。

Answer 3

To understand what this feature is, you must understand how things worked before. 要了解此功能是什么，您必须先了解其工作原理。

Pre 4.0, OpenCL could fill OpenGL buffer objects with data. 在4.0之前的版本中，OpenCL可以用数据填充OpenGL缓冲区对象。 Indeed, regular OpenGL commands could fill OpenGL buffer objects with data, either with transform feedback or by rendering to a buffer texture . 实际上，常规OpenGL命令可以通过转换反馈或通过渲染到缓冲区纹理来用数据填充OpenGL缓冲区对象。 This data could be vertex data to be used for rendering. 该数据可以是用于渲染的顶点数据。

Only the CPU can initiate the rendering of vertex data (by calling one of the glDraw* functions. Even so, there isn't a need for explicit synchronization here (outside of whatever OpenCL/OpenGL interop requires). Specifically, the CPU doesn't have to read data written by GPU operations. 只有CPU可以启动顶点数据的渲染（通过调用glDraw*函数之一。即使这样，这里也不需要显式同步（不需要OpenCL / OpenGL互操作）。具体地说，CPU不需要不必读取通过GPU操作写入的数据。

But this leads to a problem. 但这导致一个问题。 If OpenCL, or whatever GPU operation, always writes a known number of vertices to the buffer, then everything is fine. 如果OpenCL或任何GPU操作始终将已知数量的顶点写入缓冲区，则一切正常。 However, this does not have to be the case. 但是，并非必须如此。 It is often desirable for a GPU process to write an arbitrary number of vertices. 通常，GPU处理需要编写任意数量的顶点。 Obviously there needs to be a maximum limit (the size of the buffer). 显然，需要有一个最大限制（缓冲区的大小）。 But other than that, you want it to be able to write whatever it wants. 但是除此之外，您希望它能够编写任何所需的内容。

The problem is that OpenCL decided how many to write. 问题是OpenCL决定要写多少个。 But the CPU now needs that number in order to use one of the glDraw functions. 但是， CPU现在需要该数字才能使用glDraw函数之一。 If OpenCL wrote 22,000 vertices, then the CPU needs to pass 22,000 to glDrawArrays . 如果OpenCL写了22,000个顶点，则CPU需要将22,000个顶点传递给glDrawArrays 。

What ARB_draw_indirect (a core feature of GL 4.0) does is allow a GPU process to write values into a buffer object that represent the parameters you would pass to a glDraw* function. ARB_draw_indirect （GL 4.0的核心功能）的作用是使GPU进程将值写入表示要传递给glDraw*函数的参数的缓冲区对象。 The only parameter not covered by this is the primitive type. 此参数未涵盖的唯一参数是原始类型。

Note that the CPU still controls when the rendering happens. 请注意， CPU仍然控制何时进行渲染。 The CPU still decides what buffers vertex data are pulled from. CPU仍将决定从哪个缓冲区中提取顶点数据。 So OpenCL can write several of these glDraw* commands, but until the CPU actually calls glDrawElementsIndirect for one of them, nothing actually gets rendered. 因此，OpenCL可以编写其中几个glDraw*命令，但是直到CPU实际上为其中一个调用glDrawElementsIndirect ，才实际呈现任何内容。

So what you can do is run an OpenCL process that will write some data to existing buffer objects. 因此，您可以做的是运行一个OpenCL进程，该进程将一些数据写入现有的缓冲区对象。 Then you bind those buffers using usual vertex setup, like with a VAO. 然后，您可以使用通常的顶点设置（如VAO）绑定这些缓冲区。 The OpenCL process will write the appropriate rendering command data to other buffer objects, that you will bind as indirect buffers. OpenCL进程会将适当的渲染命令数据写入其他缓冲区对象，您将这些对象绑定为间接缓冲区。 And then you use glDraw*Indirect to render these commands. 然后使用glDraw*Indirect渲染这些命令。

At no time does the CPU have to read data back from the GPU. CPU绝对不必从GPU读回数据。

OpenGL 4.0 GPU绘图功能？

问题描述

3 个解决方案

解决方案1
3 已采纳 2011-02-19 01:36:18

解决方案2
2 2011-02-18 23:57:47

解决方案3
0 2012-03-17 07:35:20

OpenGL 4.0 GPU绘图功能？

问题描述

3 个解决方案

解决方案1 3 已采纳 2011-02-19 01:36:18

解决方案2 2 2011-02-18 23:57:47

解决方案3 0 2012-03-17 07:35:20

解决方案1
3 已采纳 2011-02-19 01:36:18

解决方案2
2 2011-02-18 23:57:47

解决方案3
0 2012-03-17 07:35:20