从缓冲区读取非对齐的顶点输入变量

Question

In vertex input layouts you're allowed for example a vec3 followed by a vec2 packed closely together, however in things like uniform blocks and storage buffers a vec3 is packed to 16 bytes.在顶点输入布局中，例如，vec3 后跟 vec2 紧密打包在一起，但是在统一块和存储缓冲区之类的东西中，vec3 被打包为 16 字节。 I'd like to know what the reason for this is.我想知道这是什么原因。 But also, I'd like to know that if my vertices are in layout vec3 instead of vec4, and I want to eventually read those vertices from a storage buffer or using the buffer device address, can I?而且，我想知道如果我的顶点在布局 vec3 而不是 vec4 中，我想最终从存储缓冲区或使用缓冲区设备地址读取这些顶点，可以吗？ It essentially means that vec3s will be enlarged/padded out to 16 bytes and the layout is no longer good, the shader won't read the right data.它本质上意味着 vec3s 将被放大/填充到 16 个字节并且布局不再良好，着色器将无法读取正确的数据。 How can you have such a layout and read it from anything other than the input attributes?您如何拥有这样的布局并从输入属性以外的任何内容读取它？ For example can you read something like the following from a storage buffer or from device buffer pointer?例如，您可以从存储缓冲区或设备缓冲区指针中读取类似以下内容的内容吗？

vec3 position;
vec2 tex_coords;
uint normal;

Ordinarily in buffers Vulkan would pad the position out to 16 bytes.通常在缓冲区中，Vulkan 会将 position 填充到 16 个字节。

Answer 1

I'd like to know what the reason for this is.我想知道这是什么原因。

The vertex input process is, on some hardware, a specific hardware feature.在某些硬件上，顶点输入过程是一种特定的硬件功能。 On such hardware, it has its own decompression logic, its own caches, etc. Since it is not using the same technology as SSBOs, it is not restricted the way SSBOs are.在这样的硬件上，它有自己的解压逻辑，自己的缓存等。由于它没有使用与SSBOs相同的技术，所以它不受SSBOs的限制。

Hardware that doesn't have dedicated vertex input hardware simply emulates this.没有专用顶点输入硬件的硬件只是模拟这个。 It generates specialized vertex shader code (which is why the vertex input is not able to be separated from a pipeline) that can read and process the data in the format.它生成专门的顶点着色器代码（这就是顶点输入无法与管道分离的原因），可以读取和处理格式中的数据。 This shader logic reads the data from a single buffer binding as a single read of X bits, then extracts the individual inputs for each attribute that uses that binding.此着色器逻辑从单个缓冲区绑定中读取数据作为 X 位的单个读取，然后为使用该绑定的每个属性提取单独的输入。 It also does any normalization work as required by the format for that attribute.它还根据该属性的格式要求执行任何规范化工作。

This is what allows a vec3 input (which in the VS is 3 floats) to be fed by binary data that is in the VK_FORMAT_A2R10G10B10_UNORM_PACK32 format (32-bytes per data value).这就是允许vec3输入（在 VS 中是 3 个浮点数）由VK_FORMAT_A2R10G10B10_UNORM_PACK32格式（每个数据值 32 字节）的二进制数据提供的原因。

If you want, you can emulate this behavior by just treating the data as an array of uint s, accessing the right set of 32-bit integers with the index gl_VertexIndex*sizeof(datum) and then filling in the values by reading and processing that block of data.如果需要，您可以通过将数据视为uint数组来模拟此行为，使用索引gl_VertexIndex*sizeof(datum)访问正确的 32 位整数集，然后通过读取和处理该值来填充值数据块。

But if you're looking for something simpler, a lot of GPUs support scalar layout , which is a more granular form of storage block layout.但如果你正在寻找更简单的东西，很多 GPU 都支持标量布局，这是一种更细粒度的存储块布局形式。

从缓冲区读取非对齐的顶点输入变量

问题描述

1 个解决方案

解决方案1
0 2023-01-16 23:01:18

从缓冲区读取非对齐的顶点输入变量

问题描述

1 个解决方案

解决方案1 0 2023-01-16 23:01:18

解决方案1
0 2023-01-16 23:01:18