简体   繁体   中英

HLSL multiplication with a vector and a matrix behaves strangely

I've previously dealt with the issue of vector/matrix multiplication in HLSL behaving entirely different than expected, but I've transposed my matrices in my code to compensate, blissfully unaware of why this is necessary. But I really can't let this go.

The following summarizes my problem.

Create the projection matrix XMMatrixPerspectiveFovLH, which gives a matrix that is the transposed projection matrix -- at least that's how it appears in memory (I've printed it).

Put this matrix into a constant buffer and view it as a matrix type in HLSL. Then the product with this matrix and a column vector (column vector on right, see documentation ) actually does the projection -- this seemingly contradicts that the matrix passed into the shader is transposed (that is, the result should've been correct if I had multiplied by a row vector).

In a fit of rage, I manually wrote the matrix into a float4x4 in HLSL:

float4x4 m = { 1.358,0,0,0,0,2.41421,0,0,0,0,1.001,1,0,-0.603553,-0.1001,0 }; ,

and I got what should've happened to my cbuffer matrix: a weird transform. Surely, if HLSL compiler did not generate some code to transpose my matrix , then there should be no difference in my results.

See here for what should've been an answer to my question, but I'm not sure of the accepted answer, namely this:

And it turns out that for some reason, in D3D9 HLSL, mul always expects matrices to be stored in column-major order. However, the D3DX math library stores matrices in row-major order, and as the documentation says, ID3DXBaseEffect::SetMatrix() expects its input in row-major order. It does a transpose behind the scenes to prepare the matrix for use with mul.

Does this mean that HLSL is auto transposing matrices? If so, does it do this to exactly those matrices passed into the shaders, and not to any matrices defined within the shader code itself? How can I know that this is true, for certain? And finally, if this is the case, why is this done at all? Why not just expect the matrices passed into the shader to be in the correct format initially? It seems to me like this is a small performance hit for no reason.

Edit: I've found a way to "fix" this. Using the row_major keyword forces mul to perform as expected using standard math convention. It seems that this keyword alters how the data is put into registers, so it stores each row in a register which presumably then performs a dot product with the vector to be transformed. If true, this reduces my question to "is it faster to store the values in registers consecutively by row, or "interleaved" by column?"; I'm interested to know how it would be faster by column.

This goes back to the ancient history of DirectX...

Firstly, DirectX has long adopted "row-major matrices, row vectors, pre-multiplication, and left-handed coordinates" as the preferred model. OpenGL traditionally used "column-major matrices, column rows, post-multiplication, and right-handed coordinates". For what that means, see this blog post .

The legacy D3DXMath library reflected this choice, although the modern DirectXMath library suppports both left-handed and right-handed view coordinate systems.

XNA Game Studio adopted "row-major matrices, row vectors, pre-multiplication, and right-handed coordinates" because it was considered a bit easier to understand "larger values are farther away" for depth.

The original fixed-function render pipeline also reflected this choice, but for the shift to programmable shader-based rendering this was not mandated. You can implement any combination you want as long as you are consistent.

The HLSL compiler defaults to column-major because in the early days of shaders, there were very few instruction slots so it was worth saving a single instruction. These days, the primarily value is that column-major can be done in a more parallel form:

Column-major:

// Mul vector4 * matrix4x4
    dp4 oPos.x, v0, c0
    dp4 oPos.y, v0, c1
    dp4 oPos.z, v0, c2
    dp4 oPos.w, v0, c3

Row-major

// Mul vector4 * matrix4x4
    mul r0, v0.y, c1
    mad r0, v0.x, c0, r0
    mad r0, v0.z, c2, r0
    mad oPos, v0.w, c3, r0

You'll see that the column-major version can do all four operations independently, but they have to be chained together in the row-major form.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM