Working with masm for ml64, I'm trying to move 2 unsigned qwords from r9 and r10 into xmm0 as an unsigned 128b int
So far I came up with this:
mov r9, 111 ;low qword for test
mov r10, 222 ;high qword for test
movq xmm0, r9 ;move low to xmm0 lower bits
movq xmm1, r10 ;move high to xmm1 lower bits
pslldq xmm1, 4 ;shift xmm1 lower half to higher half
por xmm0, xmm1 ;or the 2 halves together
I think it works because
movq rax, xmm0
returns the correct low value
psrldq xmm0, 4
movq rax, xmm0
returns the correct high value
Question is though, is there a better way to do it? I'm browsing the intel intrinsic guide but I'm not very good at guessing the names for whatever instructions they may possibly have.
Your byte-shift/OR is broken because you only shifted by 4 bytes not 8; it happens to work when your 8-byte qword test values don't have any bits set in their upper half.
The SSE/AVX SIMD instruction sets include an unpack instruction you can use for this:
mov r9, 111 ; test input: low half
mov r10, 222 ; test input: high half
vmovq xmm0, r9 ; move 64 bit wide general purpose register into lower xmm half
vmovq xmm1, r10 ; ditto
vpunpcklqdq xmm0, xmm0, xmm1 ; i.e. xmm0 = low(xmm1) low(xmm0)
That means the vpunpcklqdq
instruction unpacks (or interleaves) each low source quad-word (= 64 bit) into a double quad-word (ie the full XMM register width).
In comparison with your original snippet you save one instruction.
(I've used the VEX AVX mnemonics. If you want to target SSE2 then you have to remove the v
prefix.)
Alternatively, you can use an insert instruction to move the second value into the upper half:
mov r9, 111 ; test input
mov r10, 222 ; test input
vmovq xmm0, r9 ; move 64 bit wide general purpose register into lower xmm half
vpinsrq xmm0, xmm0, r10, 1 ; i.e. xmm0 = r9 low(ymm0)
Execution-wise, on a micro-op level, this doesn't make much of a difference, ie vpinsrq
is as 'expensive' as vmov + vpunpcklqdq
but it encodes into shorter code.
The non-AVX version of this requires SSE4.1 for pinsrq
.
With a little help from your stack:
push r10
push r9
ifdef ALIGNED
movdqa xmm0, xmmword ptr [esp]
else
movdqu xmm0, xmmword ptr [esp]
endif
add esp, 16
If your __uint128 happens to live on the stack, just strip the superfluous instructions.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.