Broadcast a word to an xmm register

Question

I need to move a 16-bit word eight times into an xmm register for SSE operations

E. g.: I'd like to work with the 16-bit word ABCD to the xmm0 register, so that the final result looks like

ABCD | ABCD | ABCD | ABCD | ABCD | ABCD | ABCD | ABCD

I want to do this in order to use the paddw operation later on. So far I've found the pushfd operation which does what I want to do, but only for double words (32-bit). pshufw only works for - if I'm not mistaken - 64-bit registers. Is there the operation I am looking for, or do I have to emulate it in some way with multiple pshufw ?

Answer 1

You can achieve the desired goal by performing a shuffle and then an unpack. In NASM syntax:

    # load 16 bit from memory into all words of xmm0
    # assuming 16-byte alignment
    pshuflw xmm0, [mem], 0 # gives you [ M, M, M, M, ?, ?, ?, ? ]
    punpcklwd xmm0, xmm0   # gives you [ M, M, M, M, M, M, M, M ]

Note that this reads 16 bytes from mem and thus requires 16-byte alignment .

Only the first 2 bytes are actually used. If the number is not in memory or you can't guarantee that reading past the end is possible, use something like this:

    # load ax into all words of xmm0
    movd      xmm0, eax                  ; or movd xmm0, [mem]  4-byte load
    pshuflw   xmm0, xmm0, 0
    punpcklwd xmm0, xmm0

With AVX2, you can use a vpbroadcast* broadcast load or a broadcast from a register source. The destination can be YMM if you like.

    vpbroadcastw  xmm0, [mem]            ; 16-bit load + broadcast

Or

    vmovd         xmm0, eax
    vpbroadcastw  xmm0, xmm0

Memory-source broadcasts of 1 or 2-byte elements still decode to a load+shuffle uop on Intel CPUs, but broadcast-loads of 4-byte or 8-byte chunks are even cheaper: handled in the load port with no shuffle uop needed.

Either way this is still cheaper than 2 separate shuffles like you need without AVX2 or SSSE3 pshufb .

Broadcast a word to an xmm register

Question

1 answers

solution1
4 ACCPTED 2019-07-11 14:52:30

Broadcast a word to an xmm register

Question

1 answers

solution1 4 ACCPTED 2019-07-11 14:52:30

solution1
4 ACCPTED 2019-07-11 14:52:30