[英]Inline assembly optimisation problem on AVR GCC (ATTiny1614)
I'm trying to develop a delay function for the ATtiny1614 (using AtmelStudio 7) There is an existing platform _delay_us() which does something similar but this is as much a learning experience as being able to tweak your own code.我正在尝试为 ATtiny1614 开发延迟函数(使用 AtmelStudio 7) 有一个现有平台 _delay_us() 可以做类似的事情,但这与能够调整自己的代码一样是一种学习经验。
For the sake of delay resolution and minimum- and consistent delay time I decided to go for inline assembly.为了延迟分辨率和最小且一致的延迟时间,我决定进行内联组装。
I made the following: (snippet)我做了以下内容:(片段)
__attribute__((__always_inline__)) static inline void delay_loops(volatile uint32_t numLoops) {
asm volatile(
"loop_1%=: \n\t"
" subi %A[numLoops], 1 \n\t"
" sbci %B[numLoops], 0 \n\t"
" sbci %C[numLoops], 0 \n\t"
" sbci %D[numLoops], 0 \n\t"
" brcc loop_1%= \n\t"
:
:[numLoops] "d" (numLoops)
// d=select upper register (r16-31) only
);
}
int main(void)
{
...
delay_loops(10);
delay_loops(12);
...
}
So far, so good.到现在为止还挺好。 Everything works as expected and the following code is generated:
一切都按预期工作,并生成以下代码:
3ec: 8a e0 ldi r24, 0x0A ; 10
3ee: 90 e0 ldi r25, 0x00 ; 0
3f0: a0 e0 ldi r26, 0x00 ; 0
3f2: b0 e0 ldi r27, 0x00 ; 0
000003f4 <loop_1333>:
3f4: 81 50 subi r24, 0x01 ; 1
3f6: 90 40 sbci r25, 0x00 ; 0
3f8: a0 40 sbci r26, 0x00 ; 0
3fa: b0 40 sbci r27, 0x00 ; 0
3fc: d8 f7 brcc .-10 ; 0x3f4 <loop_1333>
3fe: 8c e0 ldi r24, 0x0C ; 12
400: 90 e0 ldi r25, 0x00 ; 0
402: a0 e0 ldi r26, 0x00 ; 0
404: b0 e0 ldi r27, 0x00 ; 0
00000406 <loop_1341>:
406: 81 50 subi r24, 0x01 ; 1
408: 90 40 sbci r25, 0x00 ; 0
40a: a0 40 sbci r26, 0x00 ; 0
40c: b0 40 sbci r27, 0x00 ; 0
40e: d8 f7 brcc .-10 ; 0x406 <loop_1341>
Registers are preloaded with the given loop value and that number of loops is then iterated.寄存器预先加载了给定的循环值,然后迭代该循环次数。
However, if I change the main code to但是,如果我将主代码更改为
int main(void)
{
...
delay_loops(12); // changed 10->12
delay_loops(12);
...
}
then the second delay becomes seemingly endless (or at least outside the scope of my logical analyser).然后第二次延迟似乎是无止境的(或至少超出了我的逻辑分析器的范围)。
The compiled assembly reveals the following:编译后的程序集显示以下内容:
3ec: 8c e0 ldi r24, 0x0C ; 12
3ee: 90 e0 ldi r25, 0x00 ; 0
3f0: a0 e0 ldi r26, 0x00 ; 0
3f2: b0 e0 ldi r27, 0x00 ; 0
000003f4 <loop_1332>:
3f4: 81 50 subi r24, 0x01 ; 1
3f6: 90 40 sbci r25, 0x00 ; 0
3f8: a0 40 sbci r26, 0x00 ; 0
3fa: b0 40 sbci r27, 0x00 ; 0
3fc: d8 f7 brcc .-10 ; 0x3f4 <loop_1332>
000003fe <loop_1339>:
3fe: 81 50 subi r24, 0x01 ; 1
400: 90 40 sbci r25, 0x00 ; 0
402: a0 40 sbci r26, 0x00 ; 0
404: b0 40 sbci r27, 0x00 ; 0
406: d8 f7 brcc .-10 ; 0x3fe <loop_1339>
Initialisation of the input value (12) is not done on the second 'call' of delay_loops().输入值 (12) 的初始化不是在 delay_loops() 的第二个“调用”中完成的。 The assembly just continues the second loop with the (altered) register values it still has.
程序集只是使用它仍然拥有的(改变的)寄存器值继续第二个循环。 I can only assume the compiler does not know I changed r24..27 and assumes they are still correctly initialised to 12, and thus optimises the proper initialisation away.
我只能假设编译器不知道我更改了 r24..27 并假设它们仍然正确初始化为 12,因此优化了正确的初始化。
How do I force proper initialisation?如何强制正确初始化?
Thanks谢谢
The inline assembler cookbook explains what you need to do if you have one operand being used for input and output. 内联汇编程序手册解释了如果您有一个操作数用于输入和输出时需要做什么。
Following their example, I think you should try replacing the two lines that start with colons with something like this:按照他们的例子,我认为你应该尝试用这样的东西替换以冒号开头的两行:
:[numLoops] "=d" (numLoops)
:"0" (numLoops)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.