简体   繁体   中英

Inline assembly optimisation problem on AVR GCC (ATTiny1614)

I'm trying to develop a delay function for the ATtiny1614 (using AtmelStudio 7) There is an existing platform _delay_us() which does something similar but this is as much a learning experience as being able to tweak your own code.

For the sake of delay resolution and minimum- and consistent delay time I decided to go for inline assembly.

I made the following: (snippet)

__attribute__((__always_inline__)) static inline void delay_loops(volatile uint32_t numLoops) {
  asm volatile(
    "loop_1%=:          \n\t"
    "   subi %A[numLoops], 1    \n\t"
    "   sbci %B[numLoops], 0    \n\t"
    "   sbci %C[numLoops], 0    \n\t"
    "   sbci %D[numLoops], 0    \n\t"
    "   brcc loop_1%=       \n\t"       
    :                           
    :[numLoops] "d" (numLoops)              
    // d=select upper register (r16-31) only
  );
}

int main(void)
{
   ...

   delay_loops(10);
   delay_loops(12);

   ...
}

So far, so good. Everything works as expected and the following code is generated:

 3ec:   8a e0           ldi r24, 0x0A   ; 10
 3ee:   90 e0           ldi r25, 0x00   ; 0
 3f0:   a0 e0           ldi r26, 0x00   ; 0
 3f2:   b0 e0           ldi r27, 0x00   ; 0

000003f4 <loop_1333>:
 3f4:   81 50           subi    r24, 0x01   ; 1
 3f6:   90 40           sbci    r25, 0x00   ; 0
 3f8:   a0 40           sbci    r26, 0x00   ; 0
 3fa:   b0 40           sbci    r27, 0x00   ; 0
 3fc:   d8 f7           brcc    .-10        ; 0x3f4 <loop_1333>

 3fe:   8c e0           ldi r24, 0x0C   ; 12
 400:   90 e0           ldi r25, 0x00   ; 0
 402:   a0 e0           ldi r26, 0x00   ; 0
 404:   b0 e0           ldi r27, 0x00   ; 0

00000406 <loop_1341>:
 406:   81 50           subi    r24, 0x01   ; 1
 408:   90 40           sbci    r25, 0x00   ; 0
 40a:   a0 40           sbci    r26, 0x00   ; 0
 40c:   b0 40           sbci    r27, 0x00   ; 0
 40e:   d8 f7           brcc    .-10        ; 0x406 <loop_1341>

Registers are preloaded with the given loop value and that number of loops is then iterated.

However, if I change the main code to

  int main(void)
  {
    ...

    delay_loops(12);    // changed 10->12
    delay_loops(12);

    ...
  }

then the second delay becomes seemingly endless (or at least outside the scope of my logical analyser).

The compiled assembly reveals the following:

 3ec:   8c e0           ldi r24, 0x0C   ; 12
 3ee:   90 e0           ldi r25, 0x00   ; 0
 3f0:   a0 e0           ldi r26, 0x00   ; 0
 3f2:   b0 e0           ldi r27, 0x00   ; 0

000003f4 <loop_1332>:
 3f4:   81 50           subi    r24, 0x01   ; 1
 3f6:   90 40           sbci    r25, 0x00   ; 0
 3f8:   a0 40           sbci    r26, 0x00   ; 0
 3fa:   b0 40           sbci    r27, 0x00   ; 0
 3fc:   d8 f7           brcc    .-10        ; 0x3f4 <loop_1332>

000003fe <loop_1339>:
 3fe:   81 50           subi    r24, 0x01   ; 1
 400:   90 40           sbci    r25, 0x00   ; 0
 402:   a0 40           sbci    r26, 0x00   ; 0
 404:   b0 40           sbci    r27, 0x00   ; 0
 406:   d8 f7           brcc    .-10        ; 0x3fe <loop_1339>

Initialisation of the input value (12) is not done on the second 'call' of delay_loops(). The assembly just continues the second loop with the (altered) register values it still has. I can only assume the compiler does not know I changed r24..27 and assumes they are still correctly initialised to 12, and thus optimises the proper initialisation away.

How do I force proper initialisation?

Thanks

The inline assembler cookbook explains what you need to do if you have one operand being used for input and output.

Following their example, I think you should try replacing the two lines that start with colons with something like this:

:[numLoops] "=d" (numLoops)                          
:"0" (numLoops)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM