AVR Assembly BRNE 延迟循环如何工作？

Question

An online delay loop generator gives me this delay loop of runtime of 0.5s for a chip running at 16MHz.对于运行在 16MHz 的芯片，在线延迟循环生成器为我提供了 0.5 秒的运行时间延迟循环。

The questions on my mind are:我心中的问题是：

Do the branches keep branching if the register becomes negative?如果寄存器变为负数，分支是否继续分支？
How exactly does one calculate the values that are loaded in the beginning?一开始加载的值究竟是如何计算的？
```
 ldi r18, 41 ldi r19, 150 ldi r20, 128 L1: dec r20 brne L1 dec r19 brne L1 dec r18 brne L1
```

Answer 1

To answer your questions exactly:要准确回答您的问题：

1: The DEC instruction doesn't know about 'signed' numbers, it just decrements an 8-bit register. 1： DEC指令不知道“有符号”数字，它只是递减一个 8 位寄存器。 The miracle of twos complement arithmetic makes this work at the wraparound (0x00 -> 0xFF, is the same bit pattern as 0 -> -1).二进制补码算法的奇迹使这项工作在环绕（0x00 -> 0xFF，与 0 -> -1 的位模式相同）。 The DEC instruction also sets the Z flag in the status register, which BRNE uses to determine if branching should happen. DEC 指令还在状态寄存器中设置 Z 标志， BRNE使用它来确定是否应该发生分支。

2: You can see from the AVR manual that DEC is a single cycle instruction. 2：从AVR手册中可以看出DEC是单周期指令。 BRNE is also a single cycle when not branching, and 2 cycles when branching. BRNE 不分支时也是单周期，分支时2个周期。 therefore to compute the time of your loop, you need to count the number of times each path will be taken.因此，要计算循环时间，您需要计算每条路径将被采用的次数。

Consider a single DEC/BRNE loop:考虑单个 DEC/BRNE 循环：

    ldi r8 0
L1: dec r8
    brne L1

This loop will execute exactly 256 times, which is 256 cycles of DEC, and 512 cycles of BRNE, for a total of 768 cycles.该循环将执行 256 次，即 DEC 的 256 个周期和 BRNE 的 512 个周期，总共 768 个周期。 At 16MHz, that's 48us.在 16MHz，那是 48us。

Wrapping that in an outer delay loop:将其包装在外部延迟循环中：

    ldi r7 10
    ldi r8 0
L1: dec r8
    brne L1
    dec r7
    brne L1

You can see that the outer loop counter will decrement every time the inner loop counter hits 0. Thus in our example the outer loop DEC/BRNE will happen 10 times(for 768 cycles), and the inner loop will happen 10 x 256 times so the total time for this loop is 10 x 48us + 48us for 528us.您可以看到，每次内循环计数器达到 0 时，外循环计数器都会递减。因此，在我们的示例中，外循环 DEC/BRNE 将发生 10 次（768 个周期），而内循环将发生 10 x 256 次，因此此循环的总时间为 10 x 48us + 48us 为 528us。 Similarly for 3 nested loops.同样对于 3 个嵌套循环。

From here, it's trivial to figure out how many times each loop should execute to achieve the desired delay.从这里开始，计算每个循环应该执行多少次以实现所需的延迟是微不足道的。 It's the largest number of iterations the outer loop can do less than the desired time, then taking that time out, do the same for the next nested loop, and so on until the inner most loop fills up the tiny amount left.这是外循环可以执行的迭代次数少于所需时间的最大次数，然后抽出该时间，对下一个嵌套循环执行相同操作，依此类推，直到最内循环填满剩余的少量。

Answer 2

How exactly does one calculate the values that are loaded in the beginning?一开始加载的值究竟是如何计算的？

Calculate total amount of cycles => 0.5s * 16000000 = 8000000计算总周期数 => 0.5s * 16000000 = 8000000

Know the total cycles of r20 and r19 loops (from zero to zero), AVR registers are 8 bit, so a full loop is 256 times ( dec 0 = 255 ).知道 r20 和 r19 循环的总周期（从零到零），AVR 寄存器是 8 位，所以一个完整的循环是 256 次（ dec 0 = 255 ）。 dec is 1 cycle. dec是 1 个周期。 brne is 2 cycles when condition (branch) happens, 1 cycle when not. brne条件（分支）发生时为 2 个周期，条件（分支）发生时为 1 个周期。

So the most inner loop:所以最内层的循环：

L1: dec  r20
    brne L1

Is from zero to zero ( r20=0 ): 255 * (1+2) + 1 * (1+1) = 767 cycles (255 times the branch is taken, 1 time it goes through).从零到零（ r20=0 ）：255 * (1+2) + 1 * (1+1) = 767 个周期（255 次分支被采用，1 次通过）。

The second wrapping loop working with r19 is then: 255 * (767+1+2) + 1 * (767+1+1) = 197119 cycles使用r19的第二个环绕循环是： 255 * (767+1+2) + 1 * (767+1+1) = 197119 个周期

The single r18 loop when branch is taken is then 197119+1+2 = 197122 cycles.采取分支时的单个r18循环是 197119+1+2 = 197122 个周期。 (197121 when branch is not taken = final exit of delay loop, I will avoid this -1 by a trick in next step). （197121 当不采用分支时 = 延迟循环的最终退出，我将在下一步中通过一个技巧来避免这个 -1）。

Now this is almost enough to calculate initial r18 , let's adjust the total cycles first by the O(1) code, that's three times ldi instruction, which takes 1 cycle: total2 = 8000000 - (1+1+1) + 1 = 7999998 ... wait, what is the last +1 there?现在这几乎足以计算初始r18 ，让我们首先通过 O(1) 代码调整总周期，这是ldi指令的三倍，需要 1 个周期： total2 = 8000000 - (1+1+1) + 1 = 7999998 ...等等，那里的最后一个 +1 是什么？ That's fake additional cycle to delay, to make the final r18 loop pretend it costs same as non-final, ie 197122 cycles.这是要延迟的假附加循环，使最终的r18循环假装它的成本与非最终循环相同，即 197122 个循环。

And that's it, the initial r18 must be enough to wait at least 7999998 cycles: r18 = (7999998 + 197122 - 1) div 197122 = 41 .就是这样，初始r18必须足以等待至少7999998 个周期： r18 = (7999998 + 197122 - 1) div 197122 = 41 。 The " + 197122 - 1" part will make sure the abundant cycles fits constraint: 0 <= abundant_cycles < 197122 (remainder by 197122 division). “+ 197122 - 1”部分将确保丰富的周期符合约束： 0 <= abundant_cycles < 197122 （剩余的197122除法）。

41 * 197122 = 8082002 ... this is too much, but now we can shave the extra cycles down by setting up also r19 and r20 to particular values, to fine-tuned the delay. 41 * 197122 = 8082002 ... 这太多了，但现在我们可以通过将r19和r20设置为特定值来减少额外的周期，以微调延迟。 So how much is to be shaved off?那么要剃多少呢？ 8082002 - 7999998 = 82004 cycles. 8082002 - 7999998 = 82004个周期。

The single r19 loop takes 770 cycles when branching and 769 when exiting, so again let's avoid the 769 by adjusting 82004 to only 82003 to be shaved off.单个r19循环在分支时需要 770 个循环，在退出时需要 769 个循环，所以让我们再次通过将 82004 调整为仅 82003 来避免 769 被剃掉。 82003 div 770 = 106 : 106 r19 loops can be skipped, r19 = 256 - 106 = 150 . 82003 div 770 = 106 ：可以跳过 106 个r19循环， r19 = 256 - 106 = 150 。 Now this will shave 81620 cycles, so 82003 - 81620 = 383 cycles more to be shaved off.现在这将减少 81620 个周期，因此要减少 82003 - 81620 = 383 个周期。

The single r20 loop takes 3 cycles when branching and 2 when exiting.单个r20循环在分支时需要 3 个周期，退出时需要 2 个周期。 Again I will take into account the exiting loop being only 2 cycles -> 383 => 382 to shave off.我将再次考虑到退出循环只有 2 个周期 -> 383 => 382 来剃掉。 And 382 div 3 = 127 , remainder 1. r20 = 256 - 127 = 129 and do one less to shave additional 3 cycles (to cover that remainder) = 128. Then 2 cycles (3-1) wait is missing to make it a full 8mil.和382 div 3 = 127 ，余数 1. r20 = 256 - 127 = 129并减少一个以减少额外的 3 个周期（以覆盖该余数）= 128。然后缺少 2 个周期 (3-1) 等待以使其成为足足800万。

So:所以：

    ldi  r18, 41
    ldi  r19, 150
    ldi  r20, 128
L1: dec  r20
    brne L1
    dec  r19
    brne L1
    dec  r18
    brne L1

According to my calculations should wait exactly 8000000-2 cycles (if not interrupted by something else).根据我的计算，应该正好等待 8000000-2 个周期（如果没有被其他东西打断的话）。

Let's try to verify:让我们尝试验证：

Initial r20 : 127 3 + 1 2 = 383 cycles初始r20 : 127 3 + 1 2 = 383 个周期
Initial r19 : 1*(383+1+2) + 148*(767+1+2) + 1*(767+1+1) = 115115 cycles (that's initial r20 incomplete cycle one time, then 149 times full time r20 cycle with the final one being -1 due to exiting brne )初始r19 : 1*(383+1+2) + 148*(767+1+2) + 1*(767+1+1) = 115115 个循环（即初始r20不完整循环一次，然后 149 次全时r20由于退出brne循环最后一个为 -1 ）
The r18 total: 1*(115115+1+2) + 39*(197119+1+2) + 1*(197119+1+1) = 7999997 cycles. r18总计：1*(115115+1+2) + 39*(197119+1+2) + 1*(197119+1+1) = 7999997 个周期。

And the three ldi are +3 cycles = 7999997+3 = 8000000.而三个ldi是+3个周期=7999997+3=8000000。

And the missing 2 cycles are nowhere to be seen, so I made somewhere a mistake.丢失的 2 个周期无处可见，所以我在某个地方犯了一个错误。

As you can see, the math behind is reasonably simple, but very mundane to do by hand, and prone to mistakes...正如你所看到的，背后的数学相当简单，但手工完成却很普通，而且容易出错……

Ah, I think I know where I did the mistake.啊，我想我知道我在哪里做错了。 When I'm shaving off the abundant cycles, the termination loop is not involved (that's part of the actual delay process), so I shouldn't have adjusted the to_shave_off cycles by -1.当我削减大量周期时，不涉及终止循环（这是实际延迟过程的一部分），所以我不应该将 to_shave_off 周期调整为 -1。 Then After r19 = 106 I would have still to shave off 384 cycles, and that's exactly 384/3 = 128 loops to shave off from r20 = 256-128 = 128 .然后在r19 = 106我仍然需要r19 = 106 384 个周期，这正是 384/3 = 128 个循环来r20 = 256-128 = 128 。 No remainder, no missing cycle, perfect 8mil.无余数，无漏循环，完美800万。

If you have trouble to follow this reverse calculation, try it other way, imagine 2 bit registers (0..3 values only), and do on paper similar loop with r18=r19=r20=2, and count the cycles manually to see how it is evolving.如果您无法按照此反向计算进行，请尝试其他方式，想象 2 位寄存器（仅 0..3 个值），并在纸上使用 r18=r19=r20=2 进行类似的循环，并手动计数循环以查看它是如何发展的。 .. ie 3x ldi = +3, dec r20,brne,dec r20,brne(skip) = +5 cycles, dec r19, brne = +3, ... etc. .. 即 3x ldi = +3, dec r20,brne,dec r20,brne(skip) = +5 个周期, dec r19, brne = +3, ... 等等。

Edit: and this was explained before by Jester in his links.编辑：Jester 之前在他的链接中对此进行了解释。 And I'm too lazy to clean this up down to some simple formula to create your own online calculator.而且我懒得把它整理成一些简单的公式来创建自己的在线计算器。

AVR Assembly BRNE 延迟循环如何工作？

问题描述

2 个解决方案

解决方案1
5 2017-11-28 07:06:10

解决方案2
3 2017-11-23 12:28:39

AVR Assembly BRNE 延迟循环如何工作？

问题描述

2 个解决方案

解决方案1 5 2017-11-28 07:06:10

解决方案2 3 2017-11-23 12:28:39

解决方案1
5 2017-11-28 07:06:10

解决方案2
3 2017-11-23 12:28:39