简体   繁体   English

在GDB(Eclipse)中使用堆栈跟踪调试Coldfire MCF5235上的未知程序异常

[英]Using stack trace to debug unknown program exception on Coldfire MCF5235 in GDB (Eclipse)

At a certain point in my C application (running bare to the metal, supervisor mode) when using the CAN controller via a third-party library, an Illegal Instruction fault was occurring, which is caught in an ISR; 在通过第三方库使用CAN控制器时,在我的C应用程序中的某个点(裸机运行到管理模式),发生了非法指令故障,该故障发生在ISR中; by that point, the program counter, fault, and return address in the exception stack frame available to the ISR were already 0. When I first encountered it, I was able to back up the stack a bit, and saw a stack trace like this: 到那时,ISR可用的异常堆栈帧中的程序计数器,故障和返回地址已经为0.当我第一次遇到它时,我能够稍微备份堆栈,并看到像这样的堆栈跟踪:

Thread [1] <main> (Suspended : Step)    
    0x0    
    0x41f42200    
    ... 
    timerInterrupt() at timer.c:1,175 0x2432ec    
    0x41902210
    ...
    main() at main.c:1,433 0x211a44

Where 0x40000000 is IPSBAR for this processor. 其中0x40000000是此处理器的IPSBAR。

I ran the application several times with a known state that could reproduce this issue quickly, usually down to the exact same stack trace/saved instruction when the interrupt/exception before the jump to 0x0. 我使用已知状态运行应用程序几次,可以快速重现此问题,通常在跳转到0x0之前的中断/异常时达到完全相同的堆栈跟踪/保存指令。 Through testing, I noticed that the jump would only happen on the instruction following interrupts being re-enabled after being disabled, or in a section of code where interrupts weren't masked. 通过测试,我注意到跳转只会发生在被禁用后重新启用中断后的指令上,或者发生在没有屏蔽中断的代码段中。 So, I figured that this must be a user interrupt causing the issue, though I wasn't sure why it would appear to try to call a handler that wasn't set when the interrupt wasn't enabled in the mask. 因此,我认为这必须是导致问题的用户中断,但我不确定为什么在掩码中未启用中断时尝试调用未设置的处理程序。 I'm not 100% sure of the meaning of the addresses in the IPSBAR range that precede and ISR being called, but since they're the same for each call of that ISR, I figure I could use it to indicate the source of the last interrupt/exception. 我不是100%确定ISB被调用的IPSBAR范围内地址的含义,但由于它们对于该ISR的每次调用都是相同的,我想我可以用它来表示该ISR的来源。最后的中断/异常。

So, I added a default interrupt handler to all interrupt vectors on interrupt controller 0 before the normal handlers were added and ran the application again - and lo and behold, a breakpoint set in the default handler was hit when that suspected interrupt was fired (eg, stack looked like this): 因此,在添加正常处理程序并再次运行应用程序之前,我向中断控制器0上的所有中断向量添加了一个默认中断处理程序 - 而且,请注意,在发出可疑中断时,默认处理程序中设置了断点(例如, ,堆栈看起来像这样):

Thread [1] <main> (Suspended : Step)    
    __DefaultInterrupt() at interrupts.c    
    0x41f42200    
    ...
    timerInterrupt() at timer.c:1,175 0x2432ec    
    0x41902210       
    ...
    main() at main.c:1,433 0x211a44

Observing the value of SWIACK0 in that function, I saw that the interrupt source was 100 (user interrupt 36, PIT0 interrupt). 观察该函数中SWIACK0的值,我看到中断源为100(用户中断36,PIT0中断)。 Well, that already has an ISR (timerInterrupt() in the stack above). 好吧,那已经有一个ISR(上面的堆栈中的timerInterrupt())。 I next checked the area of RAM where ISR function pointers were saved to see if the timer interrupt handler function pointer was corrupted, but there was no change between the time all interrupt handlers were set, and when the breakpoint in the default handler was hit. 我接下来检查了保存ISR函数指针的RAM区域,看看定时器中断处理程序函数指针是否已损坏,但是所有中断处理程序设置的时间和默认处理程序中的断点被触发之间没有变化。

I also noticed that if I set the interrupt level of the interrupt handler for the CAN controller to 7 (the same interrupt handles all 18 FlexCAN interrupt sources), the issue doesn't occur. 我还注意到,如果我将CAN控制器的中断处理程序的中断级别设置为7(相同的中断处理所有18个FlexCAN中断源),则不会发生此问题。 I'm not sure what to make of it just yet, but the issue does absolutely point to either the CAN library or controller being at issue. 我还不确定该怎么做,但问题绝对指向CAN库或控制器有问题。

EDIT - I wasn't sure at this point exactly which ISR was handling the interrupt, but I've added individual handlers to the initially suspected interrupt sources, and it's always interrupt source 63 - which is an unused interrupt, according to the documentation, and the last one on interrupt controller 0. 编辑 - 我当时不确定哪个ISR正在处理中断,但是我已经为最初怀疑的中断源添加了单独的处理程序,并且它总是中断源63--这是一个未使用的中断,根据文档,中断控制器0上的最后一个。

EDIT 2: It occurred to me that the active interrupt source in SWIACK0 is actually correct, but there might be another issue, like the vector base address might be getting rewritten. 编辑2:我想到SWIACK0中的活动中断源实际上是正确的,但可能还有另一个问题,例如矢量基地址可能会被重写。 Unfortunately I'm not sure how to read it back as it's a write-only value. 不幸的是,我不确定如何阅读它,因为它是一个只写值。 I initially thought that the interrupt source for PIT0 was in that register because the default interrupt handler was getting called from within the timer interrupt handler, but it's also indicated if the timer interrupt isn't in the stack. 我最初认为PIT0的中断源是在该寄存器中,因为默认的中断处理程序是从定时器中断处理程序中调用的,但它指示定时器中断是否在堆栈中。 The reference manual indicates that the on-chip debug device can be used to read back control registers and therefore VBR, but I don't see any information in the debug manual to do this. 参考手册表明片上调试器件可用于读回控制寄存器,因此也可用于回读VBR,但我没有在调试手册中看到任何信息来执行此操作。

To make a rambling story short, I want to find out the source of the jump to hyperspace, or what information I can use to get it. 为了简化一个漫无边际的故事,我想找出跳转到超空间的来源,或者我可以使用什么信息来获取它。

  • What's the meaning of the addresses in the IPSBAR range getting pushed onto the stack? IPSBAR范围中的地址被压入堆栈的含义是什么?

  • Since those addressed seem to be completely tied to their source, is there a way to use a value in the stack (eg, 0x41f42200 in the first example) to determine the source of this interrupt/exception that 由于那些寻址似乎与它们的源完全相关,有没有办法在堆栈中使用一个值(例如,第一个例子中的0x41f42200)来确定这个中断/异常的来源
    pushed it onto the stack? 把它推到堆栈上?

  • Am I going about this completely wrong? 我完全错了吗? I'm more than happy to 我很高兴
    abandon any and all of this line of thinking. 放弃任何和所有这一思路。

Thanks for any help or insight, and I'll update this with more (concise) information when I can rub two brain cells together to do it. 感谢您的帮助或见解,当我可以将两个脑细胞组合在一起时,我会用更多(简明的)信息更新它。

解决了这个问题 - 结果是在CPU的勘误表中处理了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM