FreeRTOS 上不同上下文的 _Unwind_Backtrace

Question

Hello I am trying to implement error handling in FreeRTOS project.您好，我正在尝试在 FreeRTOS 项目中实现错误处理。 The handler is triggered by WatchDog interrupt, prior to WatchDog reset.在 WatchDog 复位之前，处理程序由 WatchDog 中断触发。 The idea is to log task name + call stack of the failed task.这个想法是记录失败任务的任务名称+调用堆栈。
I have managed to backtrace a call stack but in the wrong context, the context of the interrupt.我设法回溯了一个调用堆栈，但在错误的上下文中，即中断的上下文。 While I need the context of the failed task which is stored in pxCurrentTCB.虽然我需要存储在 pxCurrentTCB 中的失败任务的上下文。 but I do not know how to tell _Unwind_Backtrace to use it instead of the interrupt context, where it is called from.但我不知道如何告诉 _Unwind_Backtrace 使用它而不是中断上下文，它是从哪里调用的。 So I want to _Unwind_Backtrace not the context it is called from but for different context found in pxCurrentTCB.所以我想 _Unwind_Backtrace 不是它被调用的上下文，而是在 pxCurrentTCB 中找到的不同上下文。 I have searched and tried to understand how _Unwind_Backtrace work but without success, so please help.我已经搜索并试图了解 _Unwind_Backtrace 是如何工作的，但没有成功，所以请帮忙。
Any help will be appreciated especially sample code.任何帮助将不胜感激，尤其是示例代码。 Thank you.谢谢你。

_Unwind_Reason_Code unwind_backtrace_callback(_Unwind_Context * context, void * arg)
{
    static uint8_t row = 1;
    char str_buff[BUFF_SIZE];
    uintptr_t pc = _Unwind_GetIP(context);
    if (pc && row < MAX_ROW) {
        snprintf(str_buff, sizeof(str_buff), "%d .. 0x%x", row, pc);
        printString(str_buff, 0, ROW_SIZE * row++);
    }
    return _URC_NO_REASON;
}

void WDOG1_DriverIRQHandler(void)
{
    printString(pxCurrentTCB->pcTaskName, 0, 0);

    _Unwind_Backtrace(unwind_backtrace_callback, 0);

    while(1) Wdog_Service();
}

Answer 1

As it turns out, OpenMRN implements exactly the solution you are looking for: https://github.com/bakerstu/openmrn/blob/master/src/freertos_drivers/common/cpu_profile.hxx事实证明，OpenMRN 完全实现了您正在寻找的解决方案： https : //github.com/bakerstu/openmrn/blob/master/src/freertos_drivers/common/cpu_profile.hxx

More information can be found here: Stack Backtrace for ARM core using GCC compiler (when there is a MSP to PSP switch) .可以在此处找到更多信息：使用 GCC 编译器的 ARM 内核的堆栈回溯（当有 MSP 到 PSP 切换时）。 To quote this post:引用这篇文章：

This is doable but needs access to internal details of how libgcc implements the _Unwind_Backtrace function.这是可行的，但需要访问 libgcc 如何实现 _Unwind_Backtrace 函数的内部细节。 Fortunately the code is open-source, but depending on such internal details is brittle in that it may break in future versions of armgcc without any notice.幸运的是，代码是开源的，但是依赖于这些内部细节是脆弱的，因为它可能会在没有任何通知的情况下在未来的 armgcc 版本中崩溃。

Generally, reading through the source of libgcc doing the backtrace, it creates an inmemory virtual representation of the CPU core registers, then uses this representation to walk up the stack, simulating exception throws.通常，通过读取 libgcc 的源代码进行回溯，它会创建 CPU 内核寄存器的内存虚拟表示，然后使用此表示向上走栈，模拟异常抛出。 The first thing that _Unwind_Backtrace does is fill in this context from the current CPU registers, then call an internal implementation function. _Unwind_Backtrace 做的第一件事就是从当前 CPU 寄存器填充这个上下文，然后调用一个内部实现函数。

Creating that context manually from the stacked exception structure is sufficient to fake the backtrace going from handler mode upwards through the call stack in most cases.在大多数情况下，从堆栈异常结构手动创建该上下文足以伪造从处理程序模式向上通过调用堆栈的回溯。 Here is some example code (from https://github.com/bakerstu/openmrn/blob/62683863e8621cef35e94c9dcfe5abcaf996d7a2/src/freertos_drivers/common/cpu_profile.hxx#L162 ):这是一些示例代码（来自https://github.com/bakerstu/openmrn/blob/62683863e8621cef35e94c9dcfe5abcaf996d7a2/src/freertos_drivers/common/cpu_profile.hxx#L162 ）：

/// This struct definition mimics the internal structures of libgcc in
/// arm-none-eabi binary. It's not portable and might break in the future.
struct core_regs
{
    unsigned r[16];
};

/// This struct definition mimics the internal structures of libgcc in
/// arm-none-eabi binary. It's not portable and might break in the future.
typedef struct
{
    unsigned demand_save_flags;
    struct core_regs core;
} phase2_vrs;

/// We store what we know about the external context at interrupt entry in this
/// structure.
phase2_vrs main_context;
/// Saved value of the lr register at the exception entry.
unsigned saved_lr;

/// Takes registers from the core state and the saved exception context and
/// fills in the structure necessary for the LIBGCC unwinder.
void fill_phase2_vrs(volatile unsigned *fault_args)
{
    main_context.demand_save_flags = 0;
    main_context.core.r[0] = fault_args[0];
    main_context.core.r[1] = fault_args[1];
    main_context.core.r[2] = fault_args[2];
    main_context.core.r[3] = fault_args[3];
    main_context.core.r[12] = fault_args[4];
    // We add +2 here because first thing libgcc does with the lr value is
    // subtract two, presuming that lr points to after a branch
    // instruction. However, exception entry's saved PC can point to the first
    // instruction of a function and we don't want to have the backtrace end up
    // showing the previous function.
    main_context.core.r[14] = fault_args[6] + 2;
    main_context.core.r[15] = fault_args[6];
    saved_lr = fault_args[5];
    main_context.core.r[13] = (unsigned)(fault_args + 8); // stack pointer
}
extern "C"
{
    _Unwind_Reason_Code __gnu_Unwind_Backtrace(
        _Unwind_Trace_Fn trace, void *trace_argument, phase2_vrs *entry_vrs);
}

/// Static variable for trace_func.
void *last_ip;

/// Callback from the unwind backtrace function.
_Unwind_Reason_Code trace_func(struct _Unwind_Context *context, void *arg)
{
    void *ip;
    ip = (void *)_Unwind_GetIP(context);
    if (strace_len == 0)
    {
        // stacktrace[strace_len++] = ip;
        // By taking the beginning of the function for the immediate interrupt
        // we will attempt to coalesce more traces.
        // ip = (void *)_Unwind_GetRegionStart(context);
    }
    else if (last_ip == ip)
    {
        if (strace_len == 1 && saved_lr != _Unwind_GetGR(context, 14))
        {
            _Unwind_SetGR(context, 14, saved_lr);
            allocator.singleLenHack++;
            return _URC_NO_REASON;
        }
        return _URC_END_OF_STACK;
    }
    if (strace_len >= MAX_STRACE - 1)
    {
        ++allocator.limitReached;
        return _URC_END_OF_STACK;
    }
    // stacktrace[strace_len++] = ip;
    last_ip = ip;
    ip = (void *)_Unwind_GetRegionStart(context);
    stacktrace[strace_len++] = ip;
    return _URC_NO_REASON;
}

/// Called from the interrupt handler to take a CPU trace for the current
/// exception.
void take_cpu_trace()
{
    memset(stacktrace, 0, sizeof(stacktrace));
    strace_len = 0;
    last_ip = nullptr;
    phase2_vrs first_context = main_context;
    __gnu_Unwind_Backtrace(&trace_func, 0, &first_context);
    // This is a workaround for the case when the function in which we had the
    // exception trigger does not have a stack saved LR. In this case the
    // backtrace will fail after the first step. We manually append the second
    // step to have at least some idea of what's going on.
    if (strace_len == 1)
    {
        main_context.core.r[14] = saved_lr;
        main_context.core.r[15] = saved_lr;
        __gnu_Unwind_Backtrace(&trace_func, 0, &main_context);
    }
    unsigned h = hash_trace(strace_len, (unsigned *)stacktrace);
    struct trace *t = find_current_trace(h);
    if (!t)
    {
        t = add_new_trace(h);
    }
    if (t)
    {
        t->total_size += 1;
    }
}

/// Change this value to runtime disable and enable the CPU profile gathering
/// code.
bool enable_profiling = 0;

/// Helper function to declare the CPU usage tick interrupt.
/// @param irq_handler_name is the name of the interrupt to declare, for example
/// timer4a_interrupt_handler.
/// @param CLEAR_IRQ_FLAG is a c++ statement or statements in { ... } that will
/// be executed before returning from the interrupt to clear the timer IRQ flag.
#define DEFINE_CPU_PROFILE_INTERRUPT_HANDLER(irq_handler_name, CLEAR_IRQ_FLAG) \
    extern "C"                                                                 \
    {                                                                          \
        void __attribute__((__noinline__)) load_monitor_interrupt_handler(     \
            volatile unsigned *exception_args, unsigned exception_return_code) \
        {                                                                      \
            if (enable_profiling)                                              \
            {                                                                  \
                fill_phase2_vrs(exception_args);                               \
                take_cpu_trace();                                              \
            }                                                                  \
            cpuload_tick(exception_return_code & 4 ? 0 : 255);                 \
            CLEAR_IRQ_FLAG;                                                    \
        }                                                                      \
        void __attribute__((__naked__)) irq_handler_name(void)                 \
        {                                                                      \
            __asm volatile("mov  r0, %0 \n"                                    \
                           "str  r4, [r0, 4*4] \n"                             \
                           "str  r5, [r0, 5*4] \n"                             \
                           "str  r6, [r0, 6*4] \n"                             \
                           "str  r7, [r0, 7*4] \n"                             \
                           "str  r8, [r0, 8*4] \n"                             \
                           "str  r9, [r0, 9*4] \n"                             \
                           "str  r10, [r0, 10*4] \n"                           \
                           "str  r11, [r0, 11*4] \n"                           \
                           "str  r12, [r0, 12*4] \n"                           \
                           "str  r13, [r0, 13*4] \n"                           \
                           "str  r14, [r0, 14*4] \n"                           \
                           :                                                   \
                           : "r"(main_context.core.r)                          \
                           : "r0");                                            \
            __asm volatile(" tst   lr, #4               \n"                    \
                           " ite   eq                   \n"                    \
                           " mrseq r0, msp              \n"                    \
                           " mrsne r0, psp              \n"                    \
                           " mov r1, lr \n"                                    \
                           " ldr r2,  =load_monitor_interrupt_handler  \n"     \
                           " bx  r2  \n"                                       \
                           :                                                   \
                           :                                                   \
                           : "r0", "r1", "r2");                                \
        }                                                                      \
    }

This code is designed to take a CPU profile using a timer interrupt, but the backtrace unwinding can be reused from any handler including fault handlers.此代码旨在使用计时器中断获取 CPU 配置文件，但可以从任何处理程序（包括故障处理程序）中重用回溯展开。 Read the code from the bottom to the top:从下往上阅读代码：

It is important that the IRQ function be defined with the attribute __naked__ , otherwise the function entry header of GCC will manipulate the state of the CPU in unpredictable way, modifying the stack pointer for example.使用__naked__属性定义 IRQ 函数很重要，否则 GCC 的函数入口头将以不可预测的方式操纵 CPU 的状态，例如修改堆栈指针。

First thing we save all other core registers that are not in the exception entry struct.首先，我们保存不在异常入口结构中的所有其他核心寄存器。 We need to do this from assembly right at the beginning, because these will be typically modified by later C code when they are used as temporary registers.我们需要在开始时从汇编中执行此操作，因为当它们用作临时寄存器时，通常会被稍后的 C 代码修改。

Then we reconstruct the stack pointer from before the interrupt;然后我们从中断之前重建堆栈指针； the code will work whether the processor was in handler or thread mode before.无论处理器之前处于处理程序模式还是线程模式，代码都将起作用。 This pointer is the exception entry structure.这个指针是异常入口结构。 This code does not handle stacks that are not 4-byte aligned, but I never saw armgcc do that anyway.这段代码不处理非 4 字节对齐的堆栈，但我从未见过 armgcc 这样做。

The rest of the code is in C/C++, we fill in the internal structure we took from libgcc, then call the internal implementation of the unwinding process.剩下的代码是C/C++，我们填上我们从libgcc中取来的内部结构，然后调用unwinding过程的内部实现。 There are some adjustments we need to make to work around certain assumptions of libgcc that do not hold upon exception entry.我们需要进行一些调整来解决 libgcc 的某些假设，这些假设不支持异常条目。

There is one specific situation where the unwinding does not work, which is if the exception happened in a leaf function that does not save LR to the stack upon entry.在一种特殊情况下，展开不起作用，即异常发生在叶函数中，该叶函数在入口时未将 LR 保存到堆栈中。 This never happens when you try to do a backtrace from process mode, because the backtrace function being called will ensure that the calling function is not a leaf.当您尝试从进程模式进行回溯时，这永远不会发生，因为被调用的回溯函数将确保调用函数不是叶子。 I tried to apply some workarounds by adjusting the LR register during the backtracing process itself, but I'm not convinced it works every time.我试图通过在回溯过程中调整 LR 寄存器来应用一些变通方法，但我不相信它每次都有效。 I'm interested in suggestions on how to do this better.我对如何更好地做到这一点的建议感兴趣。

FreeRTOS 上不同上下文的 _Unwind_Backtrace

问题描述

1 个解决方案

解决方案1
2 2020-05-06 17:02:46

FreeRTOS 上不同上下文的 _Unwind_Backtrace

问题描述

1 个解决方案

解决方案1 2 2020-05-06 17:02:46

解决方案1
2 2020-05-06 17:02:46