汇编中的负数计数

Question

dosseg
.model small
.stack 100h
.data
    array db -1, -2, -3, -4, 1,2, 3, -5
.code
main PROC 
    mov ax, @data
    mov ds, ax
    xor ax, ax
    xor dx, dx ; reset dx 
    lea si, array
    mov cx, 8
    back: 
      mov bl, [si]
      cmp al, bl
      jc continue ; carry will be generated if number in bl is positive
      inc dx
      continue: 
        inc si
        clc
    loop back
    mov ah, 4ch
    int 21h
main ENDP 
end main

I wrote the above program to find the number of negative integers in an array.我写了上面的程序来查找数组中负整数的个数。
Debugging showed that when SI is pointing at -1, the carry flag becomes 1 but it should not as the value at that instant in BL is FFh (negative) and in AL is 00h, so subtracting negative number from 0 should not generate a carry.调试显示当 SI 指向 -1 时，进位标志变为 1 但它不应该是因为此时 BL 中的值为 FFh（负）而 AL 中为 00h，因此从 0 中减去负数不应产生进位. What am I doing wrong?我究竟做错了什么？

Edit: I replaced the erroneous part with:编辑：我将错误的部分替换为：

 test bl, bl 
 jns continue

and now it works as expected but I still don't know why the cmp method did not work.现在它按预期工作，但我仍然不知道为什么cmp方法不起作用。

Answer 1

When you compare al=0 with bl , carry flag (alias Below flag) will be set for any value in bl except for bl=0 , because 0 is below any unsigned number in the range 0x01..0xFF.当您将al=0与bl进行比较时，将为bl中除bl=0之外的任何值设置进位标志（别名 Below 标志），因为 0 低于 0x01..0xFF 范围内的任何无符号数。
Your array contains 8bit signed integer numbers.您的数组包含 8 位带符号的 integer 数字。 When we compare signed numbers, instead of adverbs below|above we use lower|greater which take into account Signum flag and Overflow flag.当我们比较有符号数时，我们使用lower|greater而不是below|above副词，它考虑了 Signum 标志和 Overflow 标志。

Instead of jc continue; carry will be generated if number in bl is positive而不是jc continue; carry will be generated if number in bl is positive jc continue; carry will be generated if number in bl is positive use jc continue; carry will be generated if number in bl is positive
jle continue; Jump if al=bl or if SF<>OF jle continue; Jump if al=bl or if SF<>OF . jle continue; Jump if al=bl or if SF<>OF 。

Better readable solution is to replace cmp al,bl with可读性更好的解决方案是将cmp al,bl替换为
test bl,bl
jns continue; Skip incrementing dx when signum of bl is not set.

See also Jcc .另见Jcc 。 You might output the result from DX using the returned errorlevel, just mov al,dl before int 21h .您可以使用返回的错误级别从 DX 获得 output 结果，只需将mov al,dl移动到int 21h之前。

Answer 2

If you just want to branch, use signed or signed-compare conditions如果您只想分支，请使用签名或签名比较条件

test reg,reg / jns non_negative (not sign-bit-set) or jnl non_negative (not less-than) are equivalent after a compare with zero. test reg,reg / jns non_negative （未设置符号位）或jnl non_negative （不小于）在与零比较后等效。

That uses the FLAGS and conditions for their normal semantic meaning, ie doing a normal signed compare.这使用 FLAGS 和条件来表示它们的正常语义，即进行正常的带符号比较。
( test same,same is equivalent to cmp against zero, always clearing OF and CF, and is a well-known optimization for cmp reg, 0 ) （ test same,same相当于cmp对零，总是清除 OF 和 CF，并且是cmp reg, 0的众所周知的优化）

What you're doing doesn't set CF in a way that reflects the sign-bit, so a jc (jump if CF set) isn't useful.您正在做的事情不会以反映符号位的方式设置 CF，因此jc （如果设置了 CF 则跳转）没有用。 You're counting non-zero numbers, ones where 0U < (unsigned)x is true.您正在计算非零数字，其中0U < (unsigned)x为真。

Getting the carry flag set according to the MSB根据 MSB 设置进位标志

It's only interesting to get your condition into CF if you're going to take advantage of that如果您要利用它，那么将您的条件纳入 CF 才是有趣的
by using adc dx, 0 or sbb dx, -1 to conditionally increment DX (when CF is 1 or 0, respectively.)通过使用adc dx, 0或sbb dx, -1有条件地递增 DX（当 CF 分别为 1 或 0 时。）

The sbb version is like dx -= -1 + CF so CF either cancels out the -1, or you subtract -1, ie add 1. sbb版本类似于dx -= -1 + CF ，因此 CF 要么抵消 -1，要么减去 -1，即加 1。

One way to get CF set according to the sign bit of a byte is simply to shift it out, eg shl bl, 1 , if you don't mind destroying the value in BL.根据字节的符号位设置 CF 的一种方法是简单地将其移出，例如shl bl, 1 ，如果您不介意破坏 BL 中的值。 Equivalently, add bl,bl is also a 2-byte instruction but can run on more execution units on modern CPUs.等效地， add bl,bl也是一条 2 字节指令，但可以在现代 CPU 上的更多执行单元上运行。 (They both set FLAGS the same way, including CF). （他们都以相同的方式设置 FLAGS，包括 CF）。

It's not possible with a compare against zero.与零进行比较是不可能的。 0 - x always has a borrow (CF=1) for any non-zero x , and x - 0 never has carry-out. 0 - x对任何非零x总是有一个借位 (CF=1)，而x - 0从不带进位。

Without modifying the register value, it is possible with cmp , though: 0x7f - x has unsigned wrapping (ie borrow output that sets CF) for x>=0x80 unsigned.在不修改寄存器值的情况下，使用cmp是可能的，但是：对于 x>=0x80 无符号， 0x7f - x具有无符号包装（即借用 output 设置 CF）。 ie for values with their MSB set.即对于具有其 MSB 集的值。

   xor dx, dx              ; count = 0
   mov si, OFFSET array    ; LEA takes more bytes than mov-immediate.  Never use LEA without a register, except for x86-64 RIP-relative

;;;  The interesting part
   mov  al, 0x7f           ; outside the loop
back:                      ; do {
   cmp  al, [si]             ; CF = 0x7F <(unsigned)[SI].  i.e. MSB set in [si]
   adc  dx, 0                ; count negative values
;;;  then the rest of the loop

   inc  si
   cmp  si, OFFSET array+8   ; the LOOP instruction isn't fast on most modern CPUs, and we're hard-coding the array length anyway.  Or just put a label at the end of it and use that.
   jne  back                ; }while(p != endp)

You don't need clc in this or your version.您不需要clc在此版本或您的版本中。 CF isn't "sticky"; CF 不是“粘性的”； anything that updates its value sets it to 0 or 1 regardless of the old value.无论旧值如何，任何更新其值的操作都会将其设置为 0 或 1。 And it's not an input for cmp .它不是cmp的输入。

We can't set CF=1 for bl < 0 (aka bl >= 0x80U ) with cmp bl, constant , unfortunately.不幸的是，我们无法使用cmp bl, constant为bl < 0 （aka bl >= 0x80U ）设置 CF=1。 It only works the way you're doing it, setting another register to compare against.它只能按照您的方式工作，设置另一个寄存器进行比较。 ( cmp reg, 123 exists, cmp 123,reg doesn't; most 2-operand instructions modify their destination and wouldn't make sense with an immediate destination, so it would be a special case to have yet another opcode for cmp in the other direction.) （ cmp reg, 123存在， cmp cmp 123,reg不存在；大多数 2 操作数指令修改它们的目的地并且对于直接目的地没有意义，因此在另一个方向。）

But you can do cmp bl, 0x80 to clear CF when bl < 0x80 , ie when its sign bit isn't set.但是您可以执行cmp bl, 0x80以在bl < 0x80时清除CF，即未设置其符号位时。

   cmp  byte ptr [si], 0x80        ; CF = [si] < (unsigned)0x80, i.e. non-negative
   sbb  dx, -1                     ; count when CF=0, negative values

Loading the value into a register with mov bl, [si] can be helpful for debugging, making it show up in your debugger's window of registers instead of having to examine memory. But that's not necessary;使用mov bl, [si]将值加载到寄存器中有助于调试，使其显示在调试器的 window 寄存器中，而不必检查 memory。但这不是必需的； cmp works with reg or memory operands (or an immediate), saving an instruction. cmp使用 reg 或 memory 操作数（或立即数），保存一条指令。

As a further optimization for code-size inside the loop, scasb is equivalent to cmp al, es:[di] / inc di (but the inc part doesn't set FLAGS.) And it's actually dec di if DF is set, so you'd want cld somewhere in your program before a loop using "string" instructions to make sure they go in the forward direction.作为对循环内代码大小的进一步优化， scasb相当于cmp al, es:[di] / inc di （但inc部分不设置 FLAGS。）如果设置了 DF，它实际上是dec di ，所以在使用“字符串”指令循环之前，您希望在程序中的某个地方使用cld来确保它们在正向方向上为 go。

Using scasb means you need to use AL for that.使用scasb意味着你需要为此使用 AL。 Without scasb , you could count into AL inside the loop, where it could be the exit status for your DOS call.如果没有scasb ，您可以在循环内计入 AL ，它可能是您的 DOS 调用的退出状态。 (Perhaps that's why you were trying to use AL=0, if you wanted to exit(0) instead of returning a value.) （如果您想退出（0）而不是返回值，也许这就是您尝试使用 AL=0 的原因。）

scasb isn't particularly fast on modern CPUs, but it is on real 8086; scasb在现代 CPU 上不是特别快，但它在真正的 8086 上； so is the loop instruction, because they're both compact code-size. loop指令也是如此，因为它们都是紧凑的代码大小。 loop is a special-case optimization for dec cx / jnz (but also without affecting FLAGS). loop是dec cx / jnz的特例优化（但也不影响 FLAGS）。

Or with 386 instructions, bt word ptr [si], 7 to Bit Test that bit, putting the result in CF where you can add dx, 0 .或者使用 386 条指令， bt word ptr [si], 7 to Bit 测试该位，将结果放入 CF 中，您可以在其中add dx, 0 。 bt is slow on modern CPUs with bt mem, reg (like 10 uops) because it can index outside the word indexed by the addressing mode. bt在具有bt mem, reg （如 10 uops）的现代 CPU 上速度很慢，因为它可以在寻址模式索引的单词之外进行索引。 So it would be less efficient put bt word ptr [array], cx in a loop with cx initially = 7 and incrementing with add cx, 8 inside the loop.因此，将bt word ptr [array], cx放入循环中的效率较低， cx initially = 7并在循环内增加add cx, 8 。 But that would work.但那行得通。

bt is not too bad with bt mem, imm , only 2 uops on most modern Intel and 1 on some AMD ( https://uops.info/ ). bt对于bt mem, imm来说还算不错，在大多数现代 Intel 上只有 2 uops，在某些 AMD 上只有 1 uops ( https://uops.info/ )。 It's only a single uop for bt reg, imm or bt reg,reg , like cmp , if you want to load first.如果您想先加载，它只是bt reg, imm或bt reg,reg的单个 uop，如cmp 。 (It can't macro-fuse with branches into a single uops, so if branching instead of adc, a cmp / jle would be more efficient as well as more readable.) On AMD, bts / btr / btc to also modify the bit are slower than bt even for reg,reg , decoding to extra uops. （它不能将分支宏融合为单个 uops，因此如果分支而不是 adc，则cmp / jle会更高效且可读性更高。）在 AMD 上， bts / btr / btc也修改位即使对于reg,reg解码为额外的 uops，也比bt慢。

SSE2 + popcnt to check 4, 8, or 16 bytes at once SSE2 + popcnt 一次检查 4、8 或 16 个字节

The extra fun way, since you have exactly 8 bytes, uses SSE2 and popcnt .额外有趣的方法是使用 SSE2 和popcnt ，因为您正好有 8 个字节。 (Yes this can work in 16-bit real mode, unlike AVX. In a bootloader and maybe DOS you'd have to manually enable the control-register bits that make SSE instructions not fault. Of course it only works on CPUs with popcnt , like Nehalem and later from 2008 or so, otherwise use pcmpgtb / psadbw / movq for just SSE2, or SSE1 using MMX registers.) （是的，这可以在 16 位实模式下工作，与 AVX 不同。在引导加载程序和 DOS 中，您必须手动启用使 SSE 指令不会出错的控制寄存器位。当然它只适用于带有popcnt的 CPU，像 Nehalem 和后来的 2008 年左右，否则使用pcmpgtb / psadbw / movq仅用于 SSE2，或使用 MMX 寄存器的 SSE1。）

  movq      xmm0, qword ptr [array]  ; load 8 bytes (zero-extending to a 16-byte XMM reg)
  pmovmskb  ax, xmm0                 ; pack the sign bit of each byte into an integer reg
  popcnt    ax, ax                   ; count set bits = sign bits of the bytes

Would also work easily for 4 or 16 byte arrays, or for other compile-time-constant sizes, do 2 loads and shift out overlapping bytes.对于 4 或 16 字节 arrays 或其他编译时常量大小也可以轻松工作，执行 2 次加载并移出重叠字节。

For other element sizes, there's movmskps (dword) and movmskpd (qword)对于其他元素大小，有movmskps (dword) 和movmskpd (qword)

With a larger array, you'd want to start accumulating counts in vector regs, like pcmpgtb to compare for 0 > x / psubb xmm1, xmm0 to do total -= (0 or -1) , up to 255 iterations of 16 bytes.对于更大的数组，您可能希望开始在向量 regs 中累积计数，例如pcmpgtb比较0 > x / psubb xmm1, xmm0以执行total -= (0 or -1) ，最多 255 次 16 字节迭代。 Then accumulate with psadbw against zero.然后用psadbw对零累加。 Same problem as How to count character occurrences using SIMD but replacing pcmpeqb with pcmpgtb .与How to count character occurrences using SIMD相同的问题，但将pcmpeqb替换为pcmpgtb 。

汇编中的负数计数

问题描述

2 个解决方案

解决方案1
1 2022-04-29 19:11:21

解决方案2
0 2022-04-30 08:33:15

If you just want to branch, use signed or signed-compare conditions如果您只想分支，请使用签名或签名比较条件

Getting the carry flag set according to the MSB根据 MSB 设置进位标志

SSE2 + popcnt to check 4, 8, or 16 bytes at once SSE2 + popcnt 一次检查 4、8 或 16 个字节

汇编中的负数计数

问题描述

2 个解决方案

解决方案1 1 2022-04-29 19:11:21

解决方案2 0 2022-04-30 08:33:15

If you just want to branch, use signed or signed-compare conditions如果您只想分支，请使用签名或签名比较条件

Getting the carry flag set according to the MSB根据 MSB 设置进位标志

SSE2 + popcnt to check 4, 8, or 16 bytes at once SSE2 + popcnt 一次检查 4、8 或 16 个字节

解决方案1
1 2022-04-29 19:11:21

解决方案2
0 2022-04-30 08:33:15