这听起来像是堆栈溢出吗？

Question

I think I might be having a stack overflow problem or something similar in my embedded firmware code. 我想我的嵌入式固件代码中可能存在堆栈溢出问题或类似问题。 I am a new programmer and have never dealt with a SO so I'm not sure if that is what's happening or not. 我是一个新的程序员，从来没有处理过SO所以我不确定这是不是发生了什么。

The firmware controls a device with a wheel that has magnets evenly spaced around it and the board has a hall effect sensor that senses when magnet is over it. 固件控制带有轮子的设备，轮子周围均匀分布磁铁，并且板上有一个霍尔效应传感器，可以检测磁铁何时在其上方。 My firmware operates the stepper and also count steps while monitoring the magnet sensor in order to detect if the wheel has stalled. 我的固件操作步进器并在监控磁铁传感器时计算步数，以检测车轮是否已停转。

I am using a timer interrupt on my chip (8 bit, 8057 acrh.) to set output ports to control the motor and for the stall detection. 我在芯片上使用定时器中断（8位，8057 acrh。）来设置输出端口以控制电机和失速检测。 The stall detection code looks like this... 失速检测代码看起来像这样......

    //   Enter ISR
    //   Change the ports to the appropriate value for the next step
    //    ...

    StallDetector++;      // Increment the stall detector

    if(PosSensor != LastPosMagState)
    {
        StallDetector = 0;

        LastPosMagState = PosSensor;
    }
    else
    {
        if (PosSensor == ON) 
        {
            if (StallDetector > (MagnetSize + 10))
            {
                HandleStallEvent();
            }
        }
        else if (PosSensor == OFF) 
        {
            if (StallDetector > (GapSize + 10))
            {
                HandleStallEvent();
            }
        }
    }

this code is called every time the ISR is triggered. 每次触发ISR时都会调用此代码。 PosSensor is the magnet sensor. PosSensor是磁传感器。 MagnetSize is the number of stepper steps that it takes to get through the magnet field. MagnetSize是通过磁场所需的步进步数。 GapSize is the number of steps between two magnets. GapSize是两个磁铁之间的步数。 So I want to detect if the wheel gets stuck either with the sensor over a magnet or not over a magnet. 因此，我想检测车轮是否被传感器卡在磁铁上或不是磁铁上。

This works great for a long time but then after a while the first stall event will occur because 'StallDetector > (MagnetSize + 10)' but when I look at the value of StallDetector it is always around 220! 这很长一段时间很有效但过了一段时间后第一次失速事件就会发生，因为'StallDetector>（MagnetSize + 10）'但是当我看到StallDetector的值时，总是大约220！ This doesn't make sense because MagnetSize is always around 35. So the stall event should have been triggered at like 46 but somehow it got all the way up to 220? 这没有任何意义，因为MagnetSize总是在35左右。所以这个失速事件应该是在类似于46的情况下触发但不知何故它一直到220？ And I don't set the value of stall detector anywhere else in my code. 而且我没有在我的代码中的任何其他位置设置失速检测器的值。

Do you have any advice on how I can track down the root of this problem? 您对我如何追踪这个问题的根源有什么建议吗？

The ISR looks like this ISR看起来像这样

void Timer3_ISR(void) interrupt 14
{
    OperateStepper();  // This is the function shown above
    TMR3CN &= ~0x80;   // Clear Timer3 interrupt flag        
}

HandleStallEvent just sets a few variable back to their default values so that it can attempt another move... HandleStallEvent只是将一些变量设置回默认值，以便它可以尝试另一个移动...

#pragma save
#pragma nooverlay
void HandleStallEvent()
{
///*
    PulseMotor = 0;                 //Stop the wheel from moving
    SetMotorPower(0);               //Set motor power low
    MotorSpeed = LOW_SPEED;
    SetSpeedHz();
    ERROR_STATE = 2;
    DEVICE_IS_HOMED = FALSE;
    DEVICE_IS_HOMING = FALSE;
    DEVICE_IS_MOVING = FALSE;
    HOMING_STATE = 0;
    MOVING_STATE = 0;
    CURRENT_POSITION = 0;
    StallDetector = 0;
    return;
//*/
}
#pragma restore

Answer 1

Is PosSensor volatile? PosSensor是不稳定的？ That is, do you update PosSensor somewhere, or is it directly reading a GPIO? 也就是说，你在某处更新PosSensor，还是直接读取GPIO？

I assume GapSize is rather large (> 220?) It sounds to me like you might have a race condition. 我认为GapSize相当大（> 220？）听起来像你可能有竞争条件。

// PosSensor == OFF, LastPosMagState == OFF
    if(PosSensor != LastPosMagState)
    {
        StallDetector = 0;

        LastPosMagState = PosSensor;
    }
    else
    {
// Race Condition: PosSensor turns ON here
// while LastPosMagState still == OFF
        if (PosSensor == ON) 
        {
            if (StallDetector > (MagnetSize + 10))
            {
                HandleStallEvent();
            }
        }
        else if (PosSensor == OFF) 
        {
            if (StallDetector > (GapSize + 10))
            {
                HandleStallEvent();
            }
        }
    }

You should cache the value of PosSensor once, right after doing StallDetector++, so that in the event PosSensor changes during your code, you don't start testing the new value. 您应该在执行StallDetector ++之后立即缓存PosSensor的值一次，以便在代码期间PosSensor更改时，您不会开始测试新值。

Answer 2

Does HandleStallEvent() "look at" StallDetector within the ISR or does it trigger something on the main loop? HandleStallEvent() “查看”ISR中的StallDetector还是在主循环上触发了什么？ If it's on the main loop, are you clearing the interrupt bit? 如果它在主循环上，你清除中断位吗？

Or are you looking at StallDetector from a debugger outside the ISR? 或者您是从ISR之外的调试器查看StallDetector ？ Then a retriggered interrupt would use the correct value each time, but execute too many times, and you would only see the final, inflated value. 然后，重新触发的中断每次都会使用正确的值，但执行次数过多，您只能看到最终的膨胀值。

On second thought, more likely you don't have to clear an interrupt-generating register, but rather the interrupt pin is remaining asserted by the sensor. 再想一想，您更有可能不必清除产生中断的寄存器，而是由传感器保持断言中断引脚。 You need to ignore the interrupt after it's first handled until the line deasserts, such as by having the original ISR disable itself and and reinstall it in a second ISR which handles the 1->0 transition. 您需要在第一次处理之后忽略该中断，直到线路无效为止，例如让原始ISR自行禁用，然后在处理1-> 0转换的第二个ISR中重新安装。

You might then also need to add debouncing hardware or adjust it if you have it. 然后，您可能还需要添加去抖硬件或调整它（如果有的话）。

Answer 3

This is definitely not stack overflow. 这绝对不是堆栈溢出。 If you blew the stack (overflowed it) your application would simply crash. 如果你吹掉堆栈（溢出它）你的应用程序就会崩溃。 This sounds more like something we used to call memory stomping in my C++ days. 这听起来更像是我在C ++时代称之为内存踩踏的东西。 You may not be accessing the memory location that the StallDetector value occupies via StallDetector variable alone. 您可能无法仅通过StallDetector变量访问StallDetector值占用的内存位置。 There may be another part of your code "stomping" this particular memory location erroneously. 您的代码中可能有另一部分错误地“踩踏”此特定内存位置。

Unfortunately, this kind of issue is very hard to track down. 不幸的是，这类问题很难追查。 About the only thing you could do is systematically isolate (remove from execution) chunks of your code until you narrow down and find the bug. 关于你唯一能做的就是系统地隔离（删除执行）你的代码块，直到你缩小范围并找到bug。

Answer 4

Do you have nest ISRs on your system? 你的系统上有嵌套ISR吗？ Could be something along the lines of start your ISR and increment your count, then interrupt it and do it again. 可能是启动你的ISR并增加你的计数，然后中断它并再次执行的东西。 Do this enough times and your interrupt stack can overflow. 这样做足够多，中断堆栈可能会溢出。 It could also explain such a high counter variable as well. 它也可以解释这么高的计数器变量。

Answer 5

Check your parameter types. 检查参数类型。 If you defined the parameters in a way different than the caller expects then calling your method could overwrite the space that variable is stored in. (For instance if you wrote the function expecting an int but it is pushing a long onto the stack.) 如果您以不同于调用者期望的方式定义参数，则调用您的方法可能会覆盖存储变量的空间。（例如，如果您编写的函数需要一个int，但它会将一个long推入堆栈。）

Answer 6

You could see what additional options your debugger supports. 您可以看到调试器支持的其他选项。 In Visual Studio, for example, it is possible to set a "data breakpoint", where you break when a memory location changes (or is set to a certain value, or above a threshold, ...). 例如，在Visual Studio中，可以设置“数据断点”，在内存位置更改（或设置为某个值或高于阈值，......）时中断。

If something like this is possible in your case, you could see where the data is changed and if there is someone else writing to the memory erroneously. 如果在您的情况下可能出现类似的情况，您可以看到数据的更改位置以及是否有其他人错误地写入内存。

这听起来像是堆栈溢出吗？

问题描述

6 个解决方案

解决方案1
2 已采纳 2010-04-16 17:59:13

解决方案2
1 2010-04-16 17:16:54

解决方案3
1 2010-04-16 17:23:27

解决方案4
1 2010-04-16 17:23:30

解决方案5
1 2010-04-16 17:47:14

解决方案6
1 2010-04-16 17:52:38

这听起来像是堆栈溢出吗？

问题描述

6 个解决方案

解决方案1 2 已采纳 2010-04-16 17:59:13

解决方案2 1 2010-04-16 17:16:54

解决方案3 1 2010-04-16 17:23:27

解决方案4 1 2010-04-16 17:23:30

解决方案5 1 2010-04-16 17:47:14

解决方案6 1 2010-04-16 17:52:38

解决方案1
2 已采纳 2010-04-16 17:59:13

解决方案2
1 2010-04-16 17:16:54

解决方案3
1 2010-04-16 17:23:27

解决方案4
1 2010-04-16 17:23:30

解决方案5
1 2010-04-16 17:47:14

解决方案6
1 2010-04-16 17:52:38