简体   繁体   English

为什么我的VBO比显示列表慢?

[英]Why are my VBOs slower than display lists?

I created two simple voxel engines, literally just chunks that hold cubes. 我创建了两个简单的体素引擎,实际上只是容纳多维数据集的块。 For the first one, I use display lists and can render hundreds of chunks at 60 FPS no problem, despite the fact that the technology behind it is years old and deprecated by now. 对于第一个,我使用显示列表,并且可以以60 FPS渲染数百个块,这没问题,尽管事实上它背后的技术已经存在了很长时间,并且现在已经过时了。 With my VBO version, I try to render 27 chunks and I suddenly drop to less than 50 FPS. 在我的VBO版本中,我尝试渲染27个块,然后突然降至不到50 FPS。 What gives? 是什么赋予了? I use shaders for my VBO version, but not for display list one. 我将着色器用于VBO版本,但不用于显示列表一。 Without shaders for the VBO version, I still get the same FPS rate. 如果没有用于VBO版本的着色器,我仍然可以获得相同的FPS速率。 I'll post some relevant code: 我将发布一些相关代码:

VBO VBO

Initialization of chunk: 块的初始化:

public void initGL() {
    rand = new Random();

    sizeX = (int) pos.getX() + CHUNKSIZE;
    sizeY = (int) pos.getY() + CHUNKSIZE;
    sizeZ = (int) pos.getZ() + CHUNKSIZE;

    tiles = new byte[sizeX][sizeY][sizeZ];

    vCoords = BufferUtils.createFloatBuffer(CHUNKSIZE * CHUNKSIZE * CHUNKSIZE * (3 * 4 * 6));
    cCoords = BufferUtils.createFloatBuffer(CHUNKSIZE * CHUNKSIZE * CHUNKSIZE * (4 * 4 * 6));

    createChunk();

    verticeCount = CHUNKSIZE * CHUNKSIZE * CHUNKSIZE * (4 * 4 * 6);

    vCoords.flip();
    cCoords.flip();

    vID = glGenBuffers();
    glBindBuffer(GL_ARRAY_BUFFER, vID);
    glBufferData(GL_ARRAY_BUFFER, vCoords, GL_STATIC_DRAW);
    glBindBuffer(GL_ARRAY_BUFFER, 0);

    cID = glGenBuffers();
    glBindBuffer(GL_ARRAY_BUFFER, cID);
    glBufferData(GL_ARRAY_BUFFER, cCoords, GL_STATIC_DRAW);
    glBindBuffer(GL_ARRAY_BUFFER, 0);
}
private void createChunk() {
    for (int x = (int) pos.getX(); x < sizeX; x++) {
        for (int y = (int) pos.getY(); y < sizeY; y++) {
            for (int z = (int) pos.getZ(); z < sizeZ; z++) {
                if (rand.nextBoolean() == true) {
                    tiles[x][y][z] = Tile.Grass.getId();
                } else {
                    tiles[x][y][z] = Tile.Void.getId();
                }
                vCoords.put(Shape.createCubeVertices(x, y, z, 1));
                cCoords.put(Shape.getCubeColors(tiles[x][y][z]));
            }
        }
    }
}

And then rendering: 然后渲染:

public void render() {
    glBindBuffer(GL_ARRAY_BUFFER, vID);
    glVertexPointer(3, GL_FLOAT, 0, 0L);

    glBindBuffer(GL_ARRAY_BUFFER, cID);
    glColorPointer(4, GL_FLOAT, 0, 0L);

    glEnableClientState(GL_VERTEX_ARRAY);
    glEnableClientState(GL_COLOR_ARRAY);

    shader.use();
    glDrawArrays(GL_QUADS, 0, verticeCount);
    shader.release();

    glDisableClientState(GL_COLOR_ARRAY);
    glDisableClientState(GL_VERTEX_ARRAY);
}

I know I use quads, and that's bad, but I'm also using quads for my display list engine. 我知道我使用四边形,这很糟糕,但是我也将四边形用于显示列表引擎。 The shaders are very simple, all they do is take a color and apply it to the vertices, I won't even post them they are that simple. 着色器非常简单,它们所要做的就是将一种颜色着色并将其应用到顶点,我什至不会发布它们那么简单。

Display List 显示清单

Initialization: 初始化:

public void init() {
    rand = new Random();

    opaqueID = glGenLists(1);

    tiles = new byte[(int) lPosition.x][(int) lPosition.y][(int) lPosition.z];

    genRandomWorld();
    rebuild();
}
public void rebuild() {
    glNewList(opaqueID, GL_COMPILE);
    glBegin(GL_QUADS);
    for (int x = (int) sPosition.x; x < (int) lPosition.x; x++) {
        for (int y = (int) sPosition.y; y < (int) lPosition.y; y++) {
            for (int z = (int) sPosition.z; z < (int) lPosition.z; z++) {
                if (checkCubeHidden(x, y, z)) {
                    // check if tiles hidden. if not, add vertices to
                    // display list
                    if (type != 0) {
                        Tile.getTile(tiles[x][y][z]).getVertices(x, y, z, 1, spritesheet.getTextureCoordsX(tiles[x][y][z]), spritesheet.getTextureCoordsY(tiles[x][y][z]));
                    } else {
                        Tile.getTile(tiles[x][y][z]).getVertices(x, y, z, 1);
                    }
                }
            }
        }
    }
    glEnd();
    glEndList();
    spritesheet.bind();
}

I should note that in my display list version, I only add in the visible cubes. 我应该注意,在显示列表版本中,我仅添加可见的多维数据集。 So, that may be an unfair advantage, but it should not bring the VBO version down to that FPS with just 27 chunks versus 500 chunks for the display list version. 因此,这可能是一个不公平的优势,但它不应将VBO版本降低到只有27个块(而显示列表版本为500个)的FPS。 I render like this: 我这样渲染:

public void render() {
    if (tiles.length != -1) {
        glEnable(GL_BLEND);
        glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
        glCallList(opaqueID);
    }
}

So, after all of that code, I really still wonder why my VBO version is just so darn slow? 因此,经过所有这些代码之后,我真的仍然想知道为什么我的VBO版本太慢了吗? I do have a one dimensional list of chunks in my display list version for when I'm calling them to render, and a 3 dimensional one in my VBO version, but I think the JVM pretty much eliminates any lag with the extra dimensions. 当我调用它们进行渲染时,我在显示列表版本中确实有一个一维的块列表,在我的VBO版本中确实有一个三维的块,但是我认为JVM几乎消除了额外尺寸的任何滞后。 So, what am I doing wrong? 那么,我在做什么错呢?

It is hard to answer such question without having an actual project and a profiler at hand, so these are theories: 没有实际的项目和分析器很难回答这样的问题,所以这些是理论:

  • You don't show your Display Lists generation code in detail, so I'm assuming you are doing something alike glColor(); glVertex3f(); 您没有详细显示“显示列表”生成代码,因此我假设您正在做类似glColor(); glVertex3f(); glColor(); glVertex3f(); in a loop (not that you declared color once and done with it). 循环(不是您一次声明颜色并完成颜色设置)。
  • Display List implementation is implementation-specific, but usually that is interleaved array of vertex properties, because that is much more friendly to a cache (all vertice props are tightly aligned by 16bytes instead of being spread by a size of array). 显示列表实现是特定于实现的,但通常是交错的顶点属性数组,因为它对缓存更友好(所有顶点属性紧密对齐16个字节,而不是按数组大小散布)。 On the other hand, VBO you use is coming in two non-interleaved chunks - Coordinates and Colors. 另一方面,您使用的VBO分为两个非交织的块-坐标和颜色。 This could cause excessive unfriendly cache usage (especially with big amounts of data). 这可能会导致过度使用不友好的缓存(尤其是大量数据)。

As noted in comments: 如评论中所述:

try interleaving your position and colour data in a single buffer. 尝试将位置和颜色数据插入单个缓冲区中。 That is the usual recommendation for static data as it gives better memory access patterns during rendering. 这是对静态数据的通常建议,因为它在渲染过程中提供了更好的内存访问模式。 – GuyRT` – GuyRT`

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么我的mergesort比我的quicksort慢? - Why is my mergesort slower than my quicksort? 为什么Java中的“或”慢于“和”? - Why is “or” slower than “and” in Java? 为什么我的基于模数的缓冲区比写的LinkedList慢? - Why is my modulus-based buffer slower than a LinkedList for writing? 为什么我的.jar文件运行速度比eclipse中的程序慢? - Why is my .jar file running slower than the program in eclipse? 为什么我的并发程序比顺序版本慢? - Why is my concurrent program slower than the sequential version? 为什么我在Java中执行二进制搜索要比顺序搜索慢? - Why my implementation of binary search in Java slower than sequential search? 为什么我自己的AtomicLong慢于JDK中提供的速度? - why my own AtomicLong are slower than the one provide in the JDK? 为什么我的方法的JNI实现比纯Java运行慢? - Why is the JNI implementation of my method running slower than pure Java? 为什么我的HashMap实现比JDK慢10倍? - Why is my HashMap implementation 10 times slower than the JDK's? 为什么我的 Go 数组排序代码比 Java 慢得多? - Why is my Go array sorting code much slower than Java?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM