简体   繁体   中英

Writing directly into vertex buffer

The DirectX 9 application/game I inherited uses dynamic vertex buffers. Each frame, it:

  1. locks the vertex buffer
  2. cycles through meshes and writes vertex data to a temporary buffer (dynamically allocated at program start) until it's full
  3. copies the contents of the temporary buffer to the vertex buffer
  4. repeats steps 2 and 3 until all data is copied
  5. unlocks the vertex buffer

My question is, is the part with the temporary buffer necessary? Is there a reason why I shouldn't write vertex data directly into the vertex buffer?
I haven't found any evidence of this practice in the official documentation, and I don't trust the previous programmer enough.

Discaimer: I don't know how DirectX vertex buffers work, I might be wrong here.

It would probably be slower: a vertex buffer is allocated to optimize access from the GPU , ie preferrably somewhere in the GPU's own memory. That means directly accessing it from the CPU is much slower than accessing ordinary RAM. Copying a whole array on the other hand can be done relatively fast, so it is better to prepare such an array in main memory and copying it to the vertex buffer in one go.

The temporary buffer is not required, with a caveat.

DirectX dynamic vertex buffers are optimised for read access by the GPU and write access by the CPU. The write access optimisation is called write combining , and involves a different mechanism than normal memory caching. The CPU will batch writes together, given that you write to the memory in 4/8/16 byte chunks and in order.

Note that it's up to the driver to decide what kind of memory you get back from a lock on a dynamic buffer, it may not be write combined, but treating it as such is the best bet.

Write combined memory is not cached, so reading from it is a performance disaster.

It might explain why the game you inherited uses a temporary buffer if it reads as well as writes to the temporary buffer, or makes no effort to write components in order - positions first then texture coordinates for example.

There is no need for the temporary buffer. The pointer you are given back from Lock is, in essence, actually already a temporary buffer. The driver can only realistically begin any meaningful operations on it once you unlock the buffer.

If you use D3DLOCK_DISCARD , then the driver has no obligation to honour reads with any sensible data. So the implementation can perfectly well return malloc(size) .

If you don't use D3DLOCK_DISCARD , then, well, that's a separate question, really.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM