I had a problem on ARM Cortex M3 with some functionality that requires multiply and divide natural numbers. The point is: if it's possible to make calculations between 128 bits numbers? I need to know how many assembly operations need software emulate multiple two 32 numbers and divide two 32bits numbers and then I will calculate time consumption of my calculations. I stuck with some calculation and I need help because maybe I have to change my uC to cortex M4 with hardware ALU.
Can You help me with this?
Given simple C code:
uint32_t var1 = 12304;
uint32_t var2 = 1892637198;
uint64_t result = var1*var2;
And objdump assembler:
0: b480 push {r7}
2: b085 sub sp, #20
4: af00 add r7, sp, #0
uint32_t var1 = 12304;
6: f243 0310 movw r3, #12304 ; 0x3010
a: 60fb str r3, [r7, #12]
uint32_t var2 = 1892637198;
c: f645 230e movw r3, #23054 ; 0x5a0e
10: f2c7 03cf movt r3, #28879 ; 0x70cf
14: 60bb str r3, [r7, #8]
uint64_t result = var1*var2;
16: 68fb ldr r3, [r7, #12]
18: 68ba ldr r2, [r7, #8]
1a: fb02 f103 mul.w r1, r2, r3
1e: 460a mov r2, r1
20: f04f 0300 mov.w r3, #0
24: e9c7 2300 strd r2, r3, [r7]
So if I calculate for example multiple multiplying can I get whole instruction (because loading values into registers so plus 3 instruction for each load) or only multiplying (in this case 6 instruction)
Because multiplying two 128bits variable formatted as (x^5+x) where x is 32bits variable give me (a+b)(c+d) ab+ad+bc+bd 4 multiply (or 3 by using algorithms). So if I should calculate 4*(3+3+6) or 4*(6+?+?).
This page contains all of the cycle counts per instruction for the ARM M-Series processor. If you have the assembly code (which it sounds like you do?) then it should be easy enough to add up all of your cycles, multiply by 1/clock_freq and get your total time spent for different scenarios.
The another solution is to use systick to measure cycle count.
See this link from ARM
Edit: You can set Counter to max and force to re-load its counter value once it has reached to 0.
// Configure Systick
*STRVR = 0xFFFFFF; // max count *STCVR = 0; // force a re-load of the counter value register *STCSR = 5; // enable FCLK count without interrupt
You can read STCVR reg which is a down register before and after function and then remove the overhead cycles (to read STCVR register).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.