[英]ARM without ALU how much for operations
I had a problem on ARM Cortex M3 with some functionality that requires multiply and divide natural numbers. 我在ARM Cortex M3上遇到了一些功能,该功能需要乘和除自然数。 The point is: if it's possible to make calculations between 128 bits numbers?
关键是:是否可以在128位数字之间进行计算? I need to know how many assembly operations need software emulate multiple two 32 numbers and divide two 32bits numbers and then I will calculate time consumption of my calculations.
我需要知道有多少个汇编操作需要软件模拟多个两个32位数字并将两个32位数字相除,然后计算出我的计算所花费的时间。 I stuck with some calculation and I need help because maybe I have to change my uC to cortex M4 with hardware ALU.
我坚持进行一些计算,并且需要帮助,因为也许我必须使用硬件ALU将uC更改为cortex M4。
Can You help me with this? 你能帮我吗?
Given simple C code: 给出简单的C代码:
uint32_t var1 = 12304;
uint32_t var2 = 1892637198;
uint64_t result = var1*var2;
And objdump assembler: 和objdump汇编器:
0: b480 push {r7}
2: b085 sub sp, #20
4: af00 add r7, sp, #0
uint32_t var1 = 12304;
6: f243 0310 movw r3, #12304 ; 0x3010
a: 60fb str r3, [r7, #12]
uint32_t var2 = 1892637198;
c: f645 230e movw r3, #23054 ; 0x5a0e
10: f2c7 03cf movt r3, #28879 ; 0x70cf
14: 60bb str r3, [r7, #8]
uint64_t result = var1*var2;
16: 68fb ldr r3, [r7, #12]
18: 68ba ldr r2, [r7, #8]
1a: fb02 f103 mul.w r1, r2, r3
1e: 460a mov r2, r1
20: f04f 0300 mov.w r3, #0
24: e9c7 2300 strd r2, r3, [r7]
So if I calculate for example multiple multiplying can I get whole instruction (because loading values into registers so plus 3 instruction for each load) or only multiplying (in this case 6 instruction) 因此,例如,如果我计算乘数,我是否可以得到整条指令(因为将值加载到寄存器中,因此每次加载要加上3条指令)或仅进行乘法(在这种情况下为6条指令)
Because multiplying two 128bits variable formatted as (x^5+x) where x is 32bits variable give me (a+b)(c+d) ab+ad+bc+bd 4 multiply (or 3 by using algorithms). 因为将两个128位变量格式化为(x ^ 5 + x),其中x是32位变量,所以我得到(a + b)(c + d)ab + ad + bc + bd 4乘(或使用算法为3)。 So if I should calculate 4*(3+3+6) or 4*(6+?+?).
因此,如果我应该计算4 *(3 + 3 + 6)或4 *(6 +?+?)。
This page contains all of the cycle counts per instruction for the ARM M-Series processor. 该页面包含ARM M系列处理器每条指令的所有周期计数。 If you have the assembly code (which it sounds like you do?) then it should be easy enough to add up all of your cycles, multiply by 1/clock_freq and get your total time spent for different scenarios.
如果您有汇编代码(听起来像吗?),那么应该很容易累加所有周期,乘以1 / clock_freq并获得在不同场景下花费的总时间。
The another solution is to use systick to measure cycle count. 另一种解决方案是使用系统尺来测量周期数。
See this link from ARM 请参阅ARM的此链接
Edit: You can set Counter to max and force to re-load its counter value once it has reached to 0. 编辑:您可以将Counter设置为max,并在达到0时强制重新加载其Counter值。
// Configure Systick //配置系统
*STRVR = 0xFFFFFF; * STRVR = 0xFFFFFF; // max count *STCVR = 0;
//最大计数* STCVR = 0; // force a re-load of the counter value register *STCSR = 5;
//强制重新加载计数器值寄存器* STCSR = 5; // enable FCLK count without interrupt
//启用FCLK计数而不会中断
You can read STCVR reg which is a down register before and after function and then remove the overhead cycles (to read STCVR register). 您可以读取STCVR reg(它是功能前后的向下寄存器),然后除去开销周期(以读取STCVR寄存器)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.