简体   繁体   中英

How do AVR Assembly BRNE delay loops work?

An online delay loop generator gives me this delay loop of runtime of 0.5s for a chip running at 16MHz.

The questions on my mind are:

  1. Do the branches keep branching if the register becomes negative?
  2. How exactly does one calculate the values that are loaded in the beginning?

     ldi r18, 41 ldi r19, 150 ldi r20, 128 L1: dec r20 brne L1 dec r19 brne L1 dec r18 brne L1

To answer your questions exactly:

1: The DEC instruction doesn't know about 'signed' numbers, it just decrements an 8-bit register. The miracle of twos complement arithmetic makes this work at the wraparound (0x00 -> 0xFF, is the same bit pattern as 0 -> -1). The DEC instruction also sets the Z flag in the status register, which BRNE uses to determine if branching should happen.

2: You can see from the AVR manual that DEC is a single cycle instruction. BRNE is also a single cycle when not branching, and 2 cycles when branching. therefore to compute the time of your loop, you need to count the number of times each path will be taken.

Consider a single DEC/BRNE loop:

    ldi r8 0
L1: dec r8
    brne L1

This loop will execute exactly 256 times, which is 256 cycles of DEC, and 512 cycles of BRNE, for a total of 768 cycles. At 16MHz, that's 48us.

Wrapping that in an outer delay loop:

    ldi r7 10
    ldi r8 0
L1: dec r8
    brne L1
    dec r7
    brne L1

You can see that the outer loop counter will decrement every time the inner loop counter hits 0. Thus in our example the outer loop DEC/BRNE will happen 10 times(for 768 cycles), and the inner loop will happen 10 x 256 times so the total time for this loop is 10 x 48us + 48us for 528us. Similarly for 3 nested loops.

From here, it's trivial to figure out how many times each loop should execute to achieve the desired delay. It's the largest number of iterations the outer loop can do less than the desired time, then taking that time out, do the same for the next nested loop, and so on until the inner most loop fills up the tiny amount left.

How exactly does one calculate the values that are loaded in the beginning?

Calculate total amount of cycles => 0.5s * 16000000 = 8000000

Know the total cycles of r20 and r19 loops (from zero to zero), AVR registers are 8 bit, so a full loop is 256 times ( dec 0 = 255 ). dec is 1 cycle. brne is 2 cycles when condition (branch) happens, 1 cycle when not.

So the most inner loop:

L1: dec  r20
    brne L1

Is from zero to zero ( r20=0 ): 255 * (1+2) + 1 * (1+1) = 767 cycles (255 times the branch is taken, 1 time it goes through).


The second wrapping loop working with r19 is then: 255 * (767+1+2) + 1 * (767+1+1) = 197119 cycles

The single r18 loop when branch is taken is then 197119+1+2 = 197122 cycles. (197121 when branch is not taken = final exit of delay loop, I will avoid this -1 by a trick in next step).

Now this is almost enough to calculate initial r18 , let's adjust the total cycles first by the O(1) code, that's three times ldi instruction, which takes 1 cycle: total2 = 8000000 - (1+1+1) + 1 = 7999998 ... wait, what is the last +1 there? That's fake additional cycle to delay, to make the final r18 loop pretend it costs same as non-final, ie 197122 cycles.

And that's it, the initial r18 must be enough to wait at least 7999998 cycles: r18 = (7999998 + 197122 - 1) div 197122 = 41 . The " + 197122 - 1" part will make sure the abundant cycles fits constraint: 0 <= abundant_cycles < 197122 (remainder by 197122 division).

41 * 197122 = 8082002 ... this is too much, but now we can shave the extra cycles down by setting up also r19 and r20 to particular values, to fine-tuned the delay. So how much is to be shaved off? 8082002 - 7999998 = 82004 cycles.

The single r19 loop takes 770 cycles when branching and 769 when exiting, so again let's avoid the 769 by adjusting 82004 to only 82003 to be shaved off. 82003 div 770 = 106 : 106 r19 loops can be skipped, r19 = 256 - 106 = 150 . Now this will shave 81620 cycles, so 82003 - 81620 = 383 cycles more to be shaved off.

The single r20 loop takes 3 cycles when branching and 2 when exiting. Again I will take into account the exiting loop being only 2 cycles -> 383 => 382 to shave off. And 382 div 3 = 127 , remainder 1. r20 = 256 - 127 = 129 and do one less to shave additional 3 cycles (to cover that remainder) = 128. Then 2 cycles (3-1) wait is missing to make it a full 8mil.

So:

    ldi  r18, 41
    ldi  r19, 150
    ldi  r20, 128
L1: dec  r20
    brne L1
    dec  r19
    brne L1
    dec  r18
    brne L1

According to my calculations should wait exactly 8000000-2 cycles (if not interrupted by something else).

Let's try to verify:

Initial r20 : 127 3 + 1 2 = 383 cycles
Initial r19 : 1*(383+1+2) + 148*(767+1+2) + 1*(767+1+1) = 115115 cycles (that's initial r20 incomplete cycle one time, then 149 times full time r20 cycle with the final one being -1 due to exiting brne )
The r18 total: 1*(115115+1+2) + 39*(197119+1+2) + 1*(197119+1+1) = 7999997 cycles.

And the three ldi are +3 cycles = 7999997+3 = 8000000.

And the missing 2 cycles are nowhere to be seen, so I made somewhere a mistake.

As you can see, the math behind is reasonably simple, but very mundane to do by hand, and prone to mistakes...

Ah, I think I know where I did the mistake. When I'm shaving off the abundant cycles, the termination loop is not involved (that's part of the actual delay process), so I shouldn't have adjusted the to_shave_off cycles by -1. Then After r19 = 106 I would have still to shave off 384 cycles, and that's exactly 384/3 = 128 loops to shave off from r20 = 256-128 = 128 . No remainder, no missing cycle, perfect 8mil.

If you have trouble to follow this reverse calculation, try it other way, imagine 2 bit registers (0..3 values only), and do on paper similar loop with r18=r19=r20=2, and count the cycles manually to see how it is evolving. .. ie 3x ldi = +3, dec r20,brne,dec r20,brne(skip) = +5 cycles, dec r19, brne = +3, ... etc.

Edit: and this was explained before by Jester in his links. And I'm too lazy to clean this up down to some simple formula to create your own online calculator.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM