简体   繁体   English

OpenGL GLPaint线程渲染

[英]OpenGL GLPaint threaded rendering

I am currently using a library that is based on the GLPaint example by Apple used for drawing on screen in Open GL. 我目前正在使用基于Apple GLPaint示例的库,该示例用于在Open GL中在屏幕上绘图。 Currently whenever the canvas saves and restores the session the lines are drawn (progress is visibly seen) and it takes quite a bit of time if there are a lot of points to render. 当前,每当画布保存并恢复会话时,都会绘制线条(可见进度),如果要渲染很多点,则需要花费很多时间。 Is there any way to get this to render in parallel or quicker? 有什么办法可以并行或更快地渲染它?

This is the drawing code I'm using: 这是我正在使用的绘图代码:

CGPoint start = step.start;
CGPoint end = step.end;

// Convert touch point from UIView referential to OpenGL one (upside-down flip)
CGRect bounds = [self bounds];
start.y = bounds.size.height - start.y;
end.y = bounds.size.height - end.y;

static GLfloat*     vertexBuffer = NULL;
static NSUInteger   vertexMax = 64;
NSUInteger          vertexCount = 0,
count,
i;

[EAGLContext setCurrentContext:context];
glBindFramebufferOES(GL_FRAMEBUFFER_OES, viewFramebuffer);

// Convert locations from Points to Pixels
CGFloat scale = self.contentScaleFactor;
start.x *= scale;
start.y *= scale;
end.x *= scale;
end.y *= scale;

// Allocate vertex array buffer
if(vertexBuffer == NULL)
    vertexBuffer = malloc(vertexMax * 2 * sizeof(GLfloat));

// Add points to the buffer so there are drawing points every X pixels
count = MAX(ceilf(sqrtf((end.x - start.x) * (end.x - start.x) + (end.y - start.y) * (end.y - start.y)) / kBrushPixelStep), 1);
for(i = 0; i < count; ++i) {
    if(vertexCount == vertexMax) {
        vertexMax = 2 * vertexMax;
        vertexBuffer = realloc(vertexBuffer, vertexMax * 2 * sizeof(GLfloat));
    }

    vertexBuffer[2 * vertexCount + 0] = start.x + (end.x - start.x) * ((GLfloat)i / (GLfloat)count);
    vertexBuffer[2 * vertexCount + 1] = start.y + (end.y - start.y) * ((GLfloat)i / (GLfloat)count);
    vertexCount += 1;
}

// Render the vertex array
glVertexPointer(2, GL_FLOAT, 0, vertexBuffer);
glDrawArrays(GL_POINTS, 0, (int)vertexCount);

// Display the buffer
glBindRenderbufferOES(GL_RENDERBUFFER_OES, viewRenderbuffer);
[context presentRenderbuffer:GL_RENDERBUFFER_OES];

OpenGL is not multi-threaded. OpenGL不是多线程的。 You have to submit OpenGL commands from a single thread. 您必须从单个线程提交OpenGL命令。

You have a couple of choices: 您有两种选择:

  1. You can factor your code to use concurrency to build the data that you send to OpenGL, then submit it to the OpenGL API once it is all available. 您可以将代码分解为使用并发性来构建发送到OpenGL的数据,然后在数据全部可用后将其提交给OpenGL API。

  2. You can refactor it to do your calculations using shaders. 您可以重构它以使用着色器进行计算。 This pushes the computation off the CPU and onto the GPU, which is highly optimized for parallel operation. 这将计算从CPU推进到GPU,GPU已针对并行操作进行了高度优化。

Your code above is using realloc to reallocate a buffer repeatedly while in the for loop. 上面的代码在for循环中使用realloc重复地重新分配缓冲区。 This is dreadfully inefficient, since memory allocation is one of the slowest RAM-based operations on a modern OS. 这是非常低效的,因为内存分配是现代OS上最慢的基于RAM的操作之一。 You should refactor your code to calculate the final size of your memory buffer up-front, and then allocate the buffer at it's final size once, and not use realloc at all. 您应该重构代码以预先计算内存缓冲区的最终大小,然后一次以其最终大小分配缓冲区,而根本不使用realloc。 This should give you a many-times increase in speed with very little effort. 这应该使您几乎不费力地将速度提高很多倍。

Glancing at your code it should not be hard at all to refactor your for loop to break the vertex calculation into blocks and submit those blocks to GCD for concurrent processing. 看一下代码,根本不难重构for循环,将顶点计算分解为块并将这些块提交给GCD进行并发处理。 The trick is in breaking the tasks into work units that are large enough to benefit from parallel processing (there is a certain amount of overhead in setting up a task to run in a background queue. You want to do enough work in each work unit to make that overhead worth it.) 诀窍是将任务分解成足够大的工作单元以受益于并行处理(设置任务以在后台队列中运行会产生一定的开销。您想在每个工作单元中做足够的工作来让那笔开销值得。)

I believe the dialog in the comments above revealed the main part of your performance problem. 我相信以上评论中的对话框揭示了性能问题的主要部分。 Unless I completely misunderstood it, the high level structure of your code currently looks like this: 除非我完全误解了它,否则当前代码的高级结构如下所示:

loop over steps
    calculate list of points from start/end points
    render list of points
    present the renderbuffer
end loop

It should be massively faster to present the renderbuffer only after all the steps were rendered: 仅在渲染所有步骤之后才呈现渲染缓冲区应该更快得多:

loop over steps
    generate list of points from start/end points
    draw list of points
end loop
present the renderbuffer

Even better, generate a Vertex Buffer Object (aka VBO) for each step as part of creating it, and store the coordinates of the points for the step in the buffer. 更好的是,在创建步骤时为每个步骤生成一个顶点缓冲区对象(又名VBO),并将该步骤的点的坐标存储在缓冲区中。 Then your draw logic becomes: 然后,您的绘制逻辑变为:

loop over steps
    bind VBO for step
    draw content of VBO
end loop
present the renderbuffer

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM