简体   繁体   中英

Is there a way to increase a value in a xmm register?

I was wondering, is there a way to increase a value in a xmm register or can you only move a value into one?

What I mean is, you can do this:

inc eax

or like this:

inc [ebp+7F00F000]

is there a way to do the same with a xmm?

I have tried something to resemble it, but... it doesn't work

  inc [rbx+08]
  movss xmm1,[rbx+08]

I have even tried something really stupid but it also didn't work

push edx
pextrw edx,xmm2,0
add edx,1
mov [rbx+08],edx
movss xmm1,[rbx+08]
pop edx

There's no inc equivalent for xmm regs, and there's no immediate-operand form of paddw (so there's no equivalent to add eax, 1 either).

paddw (and other element sizes) are only available with xmm/m128 source operands. So if you want to increment one element of a vector, you need to load a constant from memory, or generate it on the fly .

eg the cheapest way to increment all elements of xmm0 is:

; outside the loop
pcmpeqw    xmm1,xmm1     # xmm1 = all-ones = -1

; inside the loop
psubw      xmm0, xmm1    ; xmm0 -= -1   (in each element).  i.e. xmm0++

Or

paddw      xmm0, [ones]  ; where ones is a static constant.

Probably only a good idea to load the constant from memory if it takes more than maybe two instructions to construct the constant, or if register pressure is a problem.


If you want to construct a constant to increment only the low 32bit element, for example, you might use byte-shift to zero the other elements:

; hoisted out of the loop
pcmpeqw    xmm1,xmm1     # xmm1 = all-ones = -1
psrldq     xmm1, 12      # xmm1 = [ 0 0 0 -1 ]


; in the loop
psubd      xmm0, xmm1

If your attempt was supposed to increment just the low 16bit element in xmm2, then yes, it was a stupid attempt. IDK what you're doing storing into [rbx+8] and then loading into xmm1 (zeroing the high 96 bits).

Here's how to write the xmm -> gp -> xmm round trip in a less dumb way. (Still terrible compared to paddw with a vector constant).

# don't push/pop.  Instead, pick a register you can clobber without saving/restoring
movd    edx, xmm2       # this is the cheapest way to get the low 16.  It doesn't matter that we also get the element 1 as garbage in the high half of edx
inc     edx             # we only care about dx, but this is still the most efficient instruction
pinsrw  xmm2, edx, 0    # normally you'd just use movd again, but we actually want to merge with the old contents.

If you wanted to work with elements other than 16bit, you'd either use SSE4.1 pinsrb / d / q , or you'd use movd and shuffles.


See Agner Fog's Optimize Assembly guide for more good tips on how to use SSE vectors. Also other links in the tag wiki.

In short, no, not in the way that you are thinking.

Under SSE, all of the original XMM registers were floating point registers. There is no increment operation for floating point.

SSE2 added a number of integer type registers, but there is still no increment. These registers and added operations were really intended for high speed arithmetic operations, including such things as dot products, accurate products with rounding, etc.

An increment operation is something that you would expect to find applied to a general register or an accumulator.

You might find this set of slides somewhat informative in terms of general overview and function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM