简体   繁体   English

将 2 个 QWORD 从通用寄存器移动到 XMM 寄存器作为高/低

[英]Moving 2 QWORDs from general purpose registers into an XMM register as high/low

Working with masm for ml64, I'm trying to move 2 unsigned qwords from r9 and r10 into xmm0 as an unsigned 128b int使用 masm for ml64,我试图将 2 个无符号 qwords 从 r9 和 r10 移动到 xmm0 作为无符号 128b int

So far I came up with this:到目前为止,我想出了这个:

mov r9, 111             ;low qword for test
mov r10, 222            ;high qword for test

movq xmm0, r9           ;move low to xmm0 lower bits
movq xmm1, r10          ;move high to xmm1 lower bits
pslldq xmm1, 4          ;shift xmm1 lower half to higher half   
por xmm0, xmm1          ;or the 2 halves together

I think it works because我认为它有效,因为

movq rax, xmm0

returns the correct low value返回正确的低值

psrldq xmm0, 4
movq rax, xmm0

returns the correct high value返回正确的高值

Question is though, is there a better way to do it?问题是,有没有更好的方法来做到这一点? I'm browsing the intel intrinsic guide but I'm not very good at guessing the names for whatever instructions they may possibly have.我正在浏览英特尔内在指南,但我不太擅长猜测他们可能拥有的任何说明的名称。

Your byte-shift/OR is broken because you only shifted by 4 bytes not 8;你的字节移位/OR 被破坏了,因为你只移位了 4 个字节而不是 8 个; it happens to work when your 8-byte qword test values don't have any bits set in their upper half.当您的 8 字节 qword 测试值在其上半部分没有设置任何位时,它恰好可以工作。


The SSE/AVX SIMD instruction sets include an unpack instruction you can use for this: SSE/AVX SIMD 指令集包括可用于此的解包指令

mov r9, 111         ; test input: low half
mov r10, 222        ; test input: high half

vmovq xmm0, r9      ; move 64 bit wide general purpose register into lower xmm half
vmovq xmm1, r10     ; ditto

vpunpcklqdq xmm0, xmm0, xmm1    ; i.e. xmm0 = low(xmm1) low(xmm0)

That means the vpunpcklqdq instruction unpacks (or interleaves) each low source quad-word (= 64 bit) into a double quad-word (ie the full XMM register width).这意味着vpunpcklqdq指令将每个低源四字(= 64 位)解包(或交错)为双四字(即完整的 XMM 寄存器宽度)。

In comparison with your original snippet you save one instruction.与您的原始代码段相比,您可以保存一条指令。

(I've used the VEX AVX mnemonics. If you want to target SSE2 then you have to remove the v prefix.) (我使用了 VEX AVX 助记符。如果你想以 SSE2 为目标,那么你必须删除v前缀。)


Alternatively, you can use an insert instruction to move the second value into the upper half:或者,您可以使用插入指令将第二个值移动到上半部分:

mov r9, 111         ; test input
mov r10, 222        ; test input

vmovq xmm0, r9      ; move 64 bit wide general purpose register into lower xmm half

vpinsrq xmm0, xmm0, r10, 1    ; i.e. xmm0 = r9 low(ymm0)

Execution-wise, on a micro-op level, this doesn't make much of a difference, ie vpinsrq is as 'expensive' as vmov + vpunpcklqdq but it encodes into shorter code.在执行方面,在微操作级别上,这并没有太大区别,即vpinsrqvmov + vpunpcklqdq一样“昂贵”,但它编码为更短的代码。

The non-AVX version of this requires SSE4.1 for pinsrq .非 AVX 版本的pinsrq需要 SSE4.1。

With a little help from your stack:在您的堆栈的帮助下:

    push   r10
    push   r9
ifdef ALIGNED
    movdqa xmm0, xmmword ptr [esp]
else
    movdqu xmm0, xmmword ptr [esp]
endif
    add    esp, 16

If your __uint128 happens to live on the stack, just strip the superfluous instructions.如果您的 __uint128 恰好位于堆栈中,只需去除多余的指令即可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM