I was wondering, is there a way to increase a value in a xmm register or can you only move a value into one?
What I mean is, you can do this:
inc eax
or like this:
inc [ebp+7F00F000]
is there a way to do the same with a xmm?
I have tried something to resemble it, but... it doesn't work
inc [rbx+08]
movss xmm1,[rbx+08]
I have even tried something really stupid but it also didn't work
push edx
pextrw edx,xmm2,0
add edx,1
mov [rbx+08],edx
movss xmm1,[rbx+08]
pop edx
There's no inc
equivalent for xmm regs, and there's no immediate-operand form of paddw
(so there's no equivalent to add eax, 1
either).
paddw
(and other element sizes) are only available with xmm/m128 source operands. So if you want to increment one element of a vector, you need to load a constant from memory, or generate it on the fly .
eg the cheapest way to increment all elements of xmm0 is:
; outside the loop
pcmpeqw xmm1,xmm1 # xmm1 = all-ones = -1
; inside the loop
psubw xmm0, xmm1 ; xmm0 -= -1 (in each element). i.e. xmm0++
Or
paddw xmm0, [ones] ; where ones is a static constant.
Probably only a good idea to load the constant from memory if it takes more than maybe two instructions to construct the constant, or if register pressure is a problem.
If you want to construct a constant to increment only the low 32bit element, for example, you might use byte-shift to zero the other elements:
; hoisted out of the loop
pcmpeqw xmm1,xmm1 # xmm1 = all-ones = -1
psrldq xmm1, 12 # xmm1 = [ 0 0 0 -1 ]
; in the loop
psubd xmm0, xmm1
If your attempt was supposed to increment just the low 16bit element in xmm2, then yes, it was a stupid attempt. IDK what you're doing storing into [rbx+8]
and then loading into xmm1 (zeroing the high 96 bits).
Here's how to write the xmm -> gp -> xmm round trip in a less dumb way. (Still terrible compared to paddw
with a vector constant).
# don't push/pop. Instead, pick a register you can clobber without saving/restoring
movd edx, xmm2 # this is the cheapest way to get the low 16. It doesn't matter that we also get the element 1 as garbage in the high half of edx
inc edx # we only care about dx, but this is still the most efficient instruction
pinsrw xmm2, edx, 0 # normally you'd just use movd again, but we actually want to merge with the old contents.
If you wanted to work with elements other than 16bit, you'd either use SSE4.1 pinsrb
/ d
/ q
, or you'd use movd
and shuffles.
See Agner Fog's Optimize Assembly guide for more good tips on how to use SSE vectors. Also other links in the x86 tag wiki.
In short, no, not in the way that you are thinking.
Under SSE, all of the original XMM registers were floating point registers. There is no increment operation for floating point.
SSE2 added a number of integer type registers, but there is still no increment. These registers and added operations were really intended for high speed arithmetic operations, including such things as dot products, accurate products with rounding, etc.
An increment operation is something that you would expect to find applied to a general register or an accumulator.
You might find this set of slides somewhat informative in terms of general overview and function.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.