简体   繁体   English

在16位处理器上提高32位数学性能

[英]Increasing performance of 32bit math on 16bit processor

I am working on some firmware for an embedded device that uses a 16 bit PIC operating at 40 MIPS and programming in C. The system will control the position of two stepper motors and maintain the step position of each motor at all times. 我正在为嵌入式设备开发一些固件,该设备使用16位PIC,工作速率为40 MIPS,并以C编程。系统将控制两个步进电机的位置,并始终保持每个电机的步进位置。 The max position of each motor is around 125000 steps so I cannot use a 16bit integer to keep track of the position. 每个电机的最大位置大约是125000步,所以我不能使用16位整数来跟踪位置。 I must use a 32 bit unsigned integer (DWORD). 我必须使用32位无符号整数(DWORD)。 The motor moves at 1000 steps per second and I have designed the firmware so that steps are processed in a Timer ISR. 电机以每秒1000步的速度移动,我设计了固件,以便在定时器ISR中处理步骤。 The timer ISR does the following: 计时器ISR执行以下操作:

1) compare the current position of one motor to the target position, if they are the same set the isMoving flag false and return. 1)将一个电机的当前位置与目标位置进行比较,如果它们是相同的,则设置isMoving标志为false并返回。 If they are different set the isMoving flag true. 如果它们不同,则将isMoving标志设置为true。

2) If the target position is larger than the current position, move one step forward, then increment the current position. 2)如果目标位置大于当前位置,向前移动一步,然后增加当前位置。

3) If the target position is smaller than the current position, move one step backward, then decrement the current position. 3)如果目标位置小于当前位置,向后移动一步,然后减小当前位置。

Here is the code: 这是代码:

void _ISR _NOPSV _T4Interrupt(void)
{
    static char StepperIndex1 = 'A';    

    if(Device1.statusStr.CurrentPosition == Device1.statusStr.TargetPosition)
    {
        Device1.statusStr.IsMoving = 0;
        // Do Nothing
    }   
    else if (Device1.statusStr.CurrentPosition > Device1.statusStr.TargetPosition)
    {
        switch (StepperIndex1)      // MOVE OUT
        {
            case 'A':
                SetMotor1PosB();
                StepperIndex1 = 'B';
                break;
            case 'B':
                SetMotor1PosC();
                StepperIndex1 = 'C';
                break;
            case 'C':
                SetMotor1PosD();
                StepperIndex1 = 'D';
                break;
            case 'D':
                default:
                SetMotor1PosA();
                StepperIndex1 = 'A';
                break;      
        }
        Device1.statusStr.CurrentPosition--;    
        Device1.statusStr.IsMoving = 1;
    }   
    else
    {
        switch (StepperIndex1)      // MOVE IN 
        {
            case 'A':
                SetMotor1PosD();
                StepperIndex1 = 'D';
                break;
            case 'B':
                SetMotor1PosA();
                StepperIndex1 = 'A';
                break;
            case 'C':
                SetMotor1PosB();
                StepperIndex1 = 'B';
                break;
            case 'D':
                default:
                SetMotor1PosC();
                StepperIndex1 = 'C';
                break;      
        }
        Device1.statusStr.CurrentPosition++;
        Device1.statusStr.IsMoving = 1;
    }   
    _T4IF = 0;          // Clear the Timer 4 Interrupt Flag.
}

The target position is set in the main program loop when move requests are received. 当接收到移动请求时,目标位置在主程序循环中设置。 The SetMotorPos lines are just macros to turn on/off specific port pins. SetMotorPos线只是用于打开/关闭特定端口引脚的宏。

My question is: Is there any way to improve the efficiency of this code? 我的问题是:有没有办法提高这段代码的效率? The code functions fine as is if the positions are 16bit integers but as 32bit integers there is too much processing. 代码函数很好,如果位置是16位整数但是32位整数处理太多。 This device must communicate with a PC without hesitation and as written there is a noticeable performance hit. 该设备必须毫不犹豫地与PC通信,并且在写入时会有明显的性能损失。 I really only need 18 bit math but I don't know of an easy way of doing that! 我真的只需要18位数学,但我不知道这样做的简单方法! Any constructive input/suggestions would be most appreciated. 任何建设性的意见/建议都将非常受欢迎。

Warning: all numbers are made up... 警告:所有号码都已组成......

Supposing that the above ISR has about 200 (likely, fewer) instructions of compiled code and those include the instructions to save/restore the CPU registers before and after the ISR, each taking 5 clock cycles (likely, 1 to 3) and you call 2 of them 1000 times a second each, we end up with 2*1000*200*5 = 2 millions of clock cycles per second or 2 MIPS. 假设上面的ISR有大约200(可能,更少)编译代码指令,那些包括在ISR之前和之后保存/恢复CPU寄存器的指令,每个指令占用5个时钟周期(可能是1到3)并且你调用其中2个每秒1000次,我们最终得到2 * 1000 * 200 * 5 =每秒2百万个时钟周期或2 MIPS。

Do you actually consume the rest 38 MIPS elsewhere? 你真的在其他地方消费38 MIPS吗?

The only thing that may be important here and I can't see it, is what's done inside of the SetMotor*Pos*() functions. 唯一可能重要的是我无法看到它,就是在SetMotor * Pos *()函数内部完成的事情。 Do they do any complex calculations? 他们做任何复杂的计算吗? Do they perform some slow communication with the motors, eg wait for them to respond to the commands sent to them? 他们是否与电机进行一些慢速通信,例如等待它们响应发送给它们的命令?

At any rate, it's doubtful that such simple code would be noticeably slower when working with 32-bit integers than with 16-bit. 无论如何,使用32位整数而不是使用16位时,这种简单的代码会明显变慢,这是值得怀疑的。

If your code is slow, find out where time is spent and how much, profile it. 如果您的代码很慢,请找出花费的时间和数量,然后对其进行分析。 Generate a square pulse signal in the ISR (going to 1 when the ISR starts, going to 0 when the ISR is about to return) and measure its duration with an oscilloscope. 在ISR中生成方波脉冲信号(当ISR开始时变为1,当ISR即将返回时变为0)并用示波器测量其持续时间。 Or do whatever is easier to find it out. 或者做任何更容易找到的事情。 Measure the time spent in all parts of the program, then optimize where really necessary, not where you have previously thought it would be. 测量在程序的所有部分花费的时间,然后优化真正需要的地方,而不是您之前认为的那样。

The difference between 16 and 32 bits arithmetic shouldn't be that big, I think, since you use only increment and comparision. 我认为,16位和32位算术之间的差异不应该那么大,因为你只使用增量和比较。 But maybe the problem is that each 32-bit arithmetic operation implies a function call (if the compiler isn't able/willing to do inlining of simpler operations). 但也许问题是每个32位算术运算意味着一个函数调用(如果编译器不能/不愿意进行简单操作的内联)。

One suggestion would be to do the arithmetic yourself, by breaking the Device1.statusStr.CurrentPosition in two, say, Device1.statusStr.CurrentPositionH and Device1.statusStr.CurrentPositionL. 一个建议是通过将Device1.statusStr.CurrentPosition和Device1.statusStr.CurrentPositionH和Device1.statusStr.CurrentPositionL中的Device1.statusStr.CurrentPosition分开来自己做算术。 Then use some macros to do the operations, like: 然后使用一些宏来执行操作,例如:

#define INC(xH,xL) {xL++;if (xL == 0) xH++;}

I would get rid of the StepperIndex1 variable and instead use the two low-order bits of CurrentPosition to keep track of the current step index. 我将摆脱StepperIndex1变量,而是使用CurrentPosition的两个低位来跟踪当前步骤索引。 Alternately, keep track of the current position in full rotations (rather than each step), so it can fit in a 16 bit variable. 或者,在完整旋转(而不是每个步骤)中跟踪当前位置,因此它可以适合16位变量。 When moving, you only increment/decrement the position when moving to phase 'A'. 移动时,只有在移动到阶段'A'时才增加/减少位置。 Of course, this means you can only target each full rotation, rather than every step. 当然,这意味着您只能定位每个完整的旋转,而不是每一步。

Sorry, but you are using bad program design. 抱歉,您使用的程序设计不好。

Let's check the difference between 16 bit and 32 bit PIC24 or PIC33 asm code... 我们来检查16位和32位PIC24或PIC33 asm代码之间的区别......

16 bit increment 16位增量

inc    PosInt16               ;one cycle

So 16 bit increment takes one cycle 因此16位增量需要一个周期

32bit increment 32位增量

clr    Wd                     ;one cycle
inc    low PosInt32           ;one cycle
addc   high PosInt32, Wd      ;one cycle

and 32 increment takes three cycles. 32增量需要三个周期。 The total difference is 2 cycles or 50ns (nano seconds). 总差异为2个周期或50ns(纳秒)。

Simple calcolation will show you all. 简单的钙化将向您展示所有。 You have 1000 steps per second and 40Mips DSP so you have 40000 instructions per step at 1000 steps per second. 您每秒有1000步和40Mips DSP, 因此您每步可以获得40000条指令,每秒1000步。 More than enough! 绰绰有余!

When you change it from 16bit to 32bit do you change any of the compile flags to tell it to compile as a 32bit application instead. 当您将其从16位更改为32位时,您是否更改了任何编译标志,以告诉它编译为32位应用程序。

have you tried compiling with the 32bit extensions but using only 16bit integers. 您是否尝试使用32位扩展进行编译,但仅使用16位整数。 do you still get such a performance drop? 你还有这样的性能下降吗?

It's likely that just by changing from 16bit to 32bit that some operations are compiled differently, perhaps do a Diff between the two sets of compiled ASM code and see what is actually different, is it lots or is it only a couple of lines. 可能只是通过从16位更改为32位,某些操作的编译方式不同,也许在两组编译的ASM代码之间进行差异,看看实际上有什么不同,它是很多还是只有几行。

Solutions would be maybe instead of using a 32bit integer, just use two 16bit integers, when the valueA is int16.Max then set it to 0 and then increment valueB by 1 otherwise just incriment ValueA by 1, when value B is >= 3 you then check valueA >= 26696 (or something similar depending if you use unsigned or signed int16) and then you have your motor checking at 12500. 解决方案可能是使用32位整数而不是使用两个16位整数,当值A为int16.Max然后将其设置为0然后将valueB递增1,否则只是将ValueA加1,当值B> = 3时然后检查valueA> = 26696(或类似的东西取决于你是否使用unsigned或signed int16)然后你的电机检查在12500。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM