[英]Count of negative numbers in assembly
dosseg
.model small
.stack 100h
.data
array db -1, -2, -3, -4, 1,2, 3, -5
.code
main PROC
mov ax, @data
mov ds, ax
xor ax, ax
xor dx, dx ; reset dx
lea si, array
mov cx, 8
back:
mov bl, [si]
cmp al, bl
jc continue ; carry will be generated if number in bl is positive
inc dx
continue:
inc si
clc
loop back
mov ah, 4ch
int 21h
main ENDP
end main
I wrote the above program to find the number of negative integers in an array.我写了上面的程序来查找数组中负整数的个数。
Debugging showed that when SI is pointing at -1, the carry flag becomes 1 but it should not as the value at that instant in BL is FFh (negative) and in AL is 00h, so subtracting negative number from 0 should not generate a carry.调试显示当 SI 指向 -1 时,进位标志变为 1 但它不应该是因为此时 BL 中的值为 FFh(负)而 AL 中为 00h,因此从 0 中减去负数不应产生进位. What am I doing wrong?我究竟做错了什么?
Edit: I replaced the erroneous part with:编辑:我将错误的部分替换为:
test bl, bl
jns continue
and now it works as expected but I still don't know why the cmp
method did not work.现在它按预期工作,但我仍然不知道为什么cmp
方法不起作用。
When you compare al=0
with bl
, carry flag (alias Below flag) will be set for any value in bl
except for bl=0
, because 0 is below any unsigned number in the range 0x01..0xFF.当您将al=0
与bl
进行比较时,将为bl
中除bl=0
之外的任何值设置进位标志(别名 Below 标志),因为 0 低于 0x01..0xFF 范围内的任何无符号数。
Your array contains 8bit signed integer numbers.您的数组包含 8 位带符号的 integer 数字。 When we compare signed numbers, instead of adverbs below|above we use lower|greater which take into account Signum flag and Overflow flag.当我们比较有符号数时,我们使用lower|greater而不是below|above副词,它考虑了 Signum 标志和 Overflow 标志。
Instead of jc continue; carry will be generated if number in bl is positive
而不是jc continue; carry will be generated if number in bl is positive
jc continue; carry will be generated if number in bl is positive
use jc continue; carry will be generated if number in bl is positive
jle continue; Jump if al=bl or if SF<>OF
jle continue; Jump if al=bl or if SF<>OF
. jle continue; Jump if al=bl or if SF<>OF
。
Better readable solution is to replace cmp al,bl
with可读性更好的解决方案是将cmp al,bl
替换为test bl,bl
jns continue; Skip incrementing dx when signum of bl is not set.
See also Jcc .另见Jcc 。 You might output the result from DX using the returned errorlevel, just mov al,dl
before int 21h
.您可以使用返回的错误级别从 DX 获得 output 结果,只需将mov al,dl
移动到int 21h
之前。
test reg,reg
/ jns non_negative
(not sign-bit-set) or jnl non_negative
(not less-than) are equivalent after a compare with zero. test reg,reg
/ jns non_negative
(未设置符号位)或jnl non_negative
(不小于)在与零比较后等效。
That uses the FLAGS and conditions for their normal semantic meaning, ie doing a normal signed compare.这使用 FLAGS 和条件来表示它们的正常语义,即进行正常的带符号比较。
( test same,same
is equivalent to cmp
against zero, always clearing OF and CF, and is a well-known optimization for cmp reg, 0
) ( test same,same
相当于cmp
对零,总是清除 OF 和 CF,并且是cmp reg, 0
的众所周知的优化)
What you're doing doesn't set CF in a way that reflects the sign-bit, so a jc
(jump if CF set) isn't useful.您正在做的事情不会以反映符号位的方式设置 CF,因此jc
(如果设置了 CF 则跳转)没有用。 You're counting non-zero numbers, ones where 0U < (unsigned)x
is true.您正在计算非零数字,其中0U < (unsigned)x
为真。
It's only interesting to get your condition into CF if you're going to take advantage of that如果您要利用它,那么将您的条件纳入 CF 才是有趣的
by using adc dx, 0
or sbb dx, -1
to conditionally increment DX (when CF is 1 or 0, respectively.)通过使用adc dx, 0
或sbb dx, -1
有条件地递增 DX(当 CF 分别为 1 或 0 时。)
The sbb
version is like dx -= -1 + CF
so CF either cancels out the -1, or you subtract -1, ie add 1. sbb
版本类似于dx -= -1 + CF
,因此 CF 要么抵消 -1,要么减去 -1,即加 1。
One way to get CF set according to the sign bit of a byte is simply to shift it out, eg shl bl, 1
, if you don't mind destroying the value in BL.根据字节的符号位设置 CF 的一种方法是简单地将其移出,例如shl bl, 1
,如果您不介意破坏 BL 中的值。 Equivalently, add bl,bl
is also a 2-byte instruction but can run on more execution units on modern CPUs.等效地, add bl,bl
也是一条 2 字节指令,但可以在现代 CPU 上的更多执行单元上运行。 (They both set FLAGS the same way, including CF). (他们都以相同的方式设置 FLAGS,包括 CF)。
It's not possible with a compare against zero.与零进行比较是不可能的。 0 - x
always has a borrow (CF=1) for any non-zero x
, and x - 0
never has carry-out. 0 - x
对任何非零x
总是有一个借位 (CF=1),而x - 0
从不带进位。
Without modifying the register value, it is possible with cmp
, though: 0x7f - x
has unsigned wrapping (ie borrow output that sets CF) for x>=0x80 unsigned.在不修改寄存器值的情况下,使用cmp
是可能的,但是:对于 x>=0x80 无符号, 0x7f - x
具有无符号包装(即借用 output 设置 CF)。 ie for values with their MSB set.即对于具有其 MSB 集的值。
xor dx, dx ; count = 0
mov si, OFFSET array ; LEA takes more bytes than mov-immediate. Never use LEA without a register, except for x86-64 RIP-relative
;;; The interesting part
mov al, 0x7f ; outside the loop
back: ; do {
cmp al, [si] ; CF = 0x7F <(unsigned)[SI]. i.e. MSB set in [si]
adc dx, 0 ; count negative values
;;; then the rest of the loop
inc si
cmp si, OFFSET array+8 ; the LOOP instruction isn't fast on most modern CPUs, and we're hard-coding the array length anyway. Or just put a label at the end of it and use that.
jne back ; }while(p != endp)
You don't need clc
in this or your version.您不需要clc
在此版本或您的版本中。 CF isn't "sticky"; CF 不是“粘性的”; anything that updates its value sets it to 0 or 1 regardless of the old value.无论旧值如何,任何更新其值的操作都会将其设置为 0 或 1。 And it's not an input for cmp
.它不是cmp
的输入。
We can't set CF=1 for bl < 0
(aka bl >= 0x80U
) with cmp bl, constant
, unfortunately.不幸的是,我们无法使用cmp bl, constant
为bl < 0
(aka bl >= 0x80U
)设置 CF=1。 It only works the way you're doing it, setting another register to compare against.它只能按照您的方式工作,设置另一个寄存器进行比较。 ( cmp reg, 123
exists, cmp 123,reg
doesn't; most 2-operand instructions modify their destination and wouldn't make sense with an immediate destination, so it would be a special case to have yet another opcode for cmp
in the other direction.) ( cmp reg, 123
存在, cmp
cmp 123,reg
不存在;大多数 2 操作数指令修改它们的目的地并且对于直接目的地没有意义,因此在另一个方向。)
But you can do cmp bl, 0x80
to clear CF when bl < 0x80
, ie when its sign bit isn't set.但是您可以执行cmp bl, 0x80
以在bl < 0x80
时清除CF,即未设置其符号位时。
cmp byte ptr [si], 0x80 ; CF = [si] < (unsigned)0x80, i.e. non-negative
sbb dx, -1 ; count when CF=0, negative values
Loading the value into a register with mov bl, [si]
can be helpful for debugging, making it show up in your debugger's window of registers instead of having to examine memory. But that's not necessary;使用mov bl, [si]
将值加载到寄存器中有助于调试,使其显示在调试器的 window 寄存器中,而不必检查 memory。但这不是必需的; cmp
works with reg or memory operands (or an immediate), saving an instruction. cmp
使用 reg 或 memory 操作数(或立即数),保存一条指令。
As a further optimization for code-size inside the loop, scasb
is equivalent to cmp al, es:[di]
/ inc di
(but the inc
part doesn't set FLAGS.) And it's actually dec di
if DF is set, so you'd want cld
somewhere in your program before a loop using "string" instructions to make sure they go in the forward direction.作为对循环内代码大小的进一步优化, scasb
相当于cmp al, es:[di]
/ inc di
(但inc
部分不设置 FLAGS。)如果设置了 DF,它实际上是dec di
,所以在使用“字符串”指令循环之前,您希望在程序中的某个地方使用cld
来确保它们在正向方向上为 go。
Using scasb
means you need to use AL for that.使用scasb
意味着你需要为此使用 AL。 Without scasb
, you could count into AL inside the loop, where it could be the exit status for your DOS call.如果没有scasb
,您可以在循环内计入 AL ,它可能是您的 DOS 调用的退出状态。 (Perhaps that's why you were trying to use AL=0, if you wanted to exit(0) instead of returning a value.) (如果您想退出(0)而不是返回值,也许这就是您尝试使用 AL=0 的原因。)
scasb
isn't particularly fast on modern CPUs, but it is on real 8086; scasb
在现代 CPU 上不是特别快,但它在真正的 8086 上; so is the loop
instruction, because they're both compact code-size. loop
指令也是如此,因为它们都是紧凑的代码大小。 loop
is a special-case optimization for dec cx
/ jnz
(but also without affecting FLAGS). loop
是dec cx
/ jnz
的特例优化(但也不影响 FLAGS)。
Or with 386 instructions, bt word ptr [si], 7
to Bit Test that bit, putting the result in CF where you can add dx, 0
.或者使用 386 条指令, bt word ptr [si], 7
to Bit 测试该位,将结果放入 CF 中,您可以在其中add dx, 0
。 bt
is slow on modern CPUs with bt mem, reg
(like 10 uops) because it can index outside the word indexed by the addressing mode. bt
在具有bt mem, reg
(如 10 uops)的现代 CPU 上速度很慢,因为它可以在寻址模式索引的单词之外进行索引。 So it would be less efficient put bt word ptr [array], cx
in a loop with cx
initially = 7
and incrementing with add cx, 8
inside the loop.因此,将bt word ptr [array], cx
放入循环中的效率较低, cx
initially = 7
并在循环内增加add cx, 8
。 But that would work.但那行得通。
bt
is not too bad with bt mem, imm
, only 2 uops on most modern Intel and 1 on some AMD ( https://uops.info/ ). bt
对于bt mem, imm
来说还算不错,在大多数现代 Intel 上只有 2 uops,在某些 AMD 上只有 1 uops ( https://uops.info/ )。 It's only a single uop for bt reg, imm
or bt reg,reg
, like cmp
, if you want to load first.如果您想先加载,它只是bt reg, imm
或bt reg,reg
的单个 uop,如cmp
。 (It can't macro-fuse with branches into a single uops, so if branching instead of adc, a cmp
/ jle
would be more efficient as well as more readable.) On AMD, bts
/ btr
/ btc
to also modify the bit are slower than bt
even for reg,reg
, decoding to extra uops. (它不能将分支宏融合为单个 uops,因此如果分支而不是 adc,则cmp
/ jle
会更高效且可读性更高。)在 AMD 上, bts
/ btr
/ btc
也修改位即使对于reg,reg
解码为额外的 uops,也比bt
慢。
The extra fun way, since you have exactly 8 bytes, uses SSE2 and popcnt
.额外有趣的方法是使用 SSE2 和popcnt
,因为您正好有 8 个字节。 (Yes this can work in 16-bit real mode, unlike AVX. In a bootloader and maybe DOS you'd have to manually enable the control-register bits that make SSE instructions not fault. Of course it only works on CPUs with popcnt
, like Nehalem and later from 2008 or so, otherwise use pcmpgtb
/ psadbw
/ movq
for just SSE2, or SSE1 using MMX registers.) (是的,这可以在 16 位实模式下工作,与 AVX 不同。在引导加载程序和 DOS 中,您必须手动启用使 SSE 指令不会出错的控制寄存器位。当然它只适用于带有popcnt
的 CPU,像 Nehalem 和后来的 2008 年左右,否则使用pcmpgtb
/ psadbw
/ movq
仅用于 SSE2,或使用 MMX 寄存器的 SSE1。)
movq xmm0, qword ptr [array] ; load 8 bytes (zero-extending to a 16-byte XMM reg)
pmovmskb ax, xmm0 ; pack the sign bit of each byte into an integer reg
popcnt ax, ax ; count set bits = sign bits of the bytes
Would also work easily for 4 or 16 byte arrays, or for other compile-time-constant sizes, do 2 loads and shift out overlapping bytes.对于 4 或 16 字节 arrays 或其他编译时常量大小也可以轻松工作,执行 2 次加载并移出重叠字节。
For other element sizes, there's movmskps
(dword) and movmskpd
(qword)对于其他元素大小,有movmskps
(dword) 和movmskpd
(qword)
With a larger array, you'd want to start accumulating counts in vector regs, like pcmpgtb
to compare for 0 > x
/ psubb xmm1, xmm0
to do total -= (0 or -1)
, up to 255 iterations of 16 bytes.对于更大的数组,您可能希望开始在向量 regs 中累积计数,例如pcmpgtb
比较0 > x
/ psubb xmm1, xmm0
以执行total -= (0 or -1)
,最多 255 次 16 字节迭代。 Then accumulate with psadbw
against zero.然后用psadbw
对零累加。 Same problem as How to count character occurrences using SIMD but replacing pcmpeqb
with pcmpgtb
.与How to count character occurrences using SIMD相同的问题,但将pcmpeqb
替换为pcmpgtb
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.