简体   繁体   中英

ARM without ALU how much for operations

I had a problem on ARM Cortex M3 with some functionality that requires multiply and divide natural numbers. The point is: if it's possible to make calculations between 128 bits numbers? I need to know how many assembly operations need software emulate multiple two 32 numbers and divide two 32bits numbers and then I will calculate time consumption of my calculations. I stuck with some calculation and I need help because maybe I have to change my uC to cortex M4 with hardware ALU.

Can You help me with this?

Given simple C code:

uint32_t var1 = 12304;
uint32_t var2 = 1892637198;
uint64_t result = var1*var2;

And objdump assembler:

   0:   b480            push    {r7}
   2:   b085            sub     sp, #20
   4:   af00            add     r7, sp, #0
        uint32_t var1 = 12304;
   6:   f243 0310       movw    r3, #12304      ; 0x3010
   a:   60fb            str     r3, [r7, #12]
        uint32_t var2 = 1892637198;
   c:   f645 230e       movw    r3, #23054      ; 0x5a0e
  10:   f2c7 03cf       movt    r3, #28879      ; 0x70cf
  14:   60bb            str     r3, [r7, #8]

        uint64_t result = var1*var2;
  16:   68fb            ldr     r3, [r7, #12]
  18:   68ba            ldr     r2, [r7, #8]
  1a:   fb02 f103       mul.w   r1, r2, r3
  1e:   460a            mov     r2, r1
  20:   f04f 0300       mov.w   r3, #0
  24:   e9c7 2300       strd    r2, r3, [r7]

So if I calculate for example multiple multiplying can I get whole instruction (because loading values into registers so plus 3 instruction for each load) or only multiplying (in this case 6 instruction)

Because multiplying two 128bits variable formatted as (x^5+x) where x is 32bits variable give me (a+b)(c+d) ab+ad+bc+bd 4 multiply (or 3 by using algorithms). So if I should calculate 4*(3+3+6) or 4*(6+?+?).

ARM Instructions

This page contains all of the cycle counts per instruction for the ARM M-Series processor. If you have the assembly code (which it sounds like you do?) then it should be easy enough to add up all of your cycles, multiply by 1/clock_freq and get your total time spent for different scenarios.

The another solution is to use systick to measure cycle count.

See this link from ARM

Edit: You can set Counter to max and force to re-load its counter value once it has reached to 0.

// Configure Systick
*STRVR = 0xFFFFFF; // max count *STCVR = 0; // force a re-load of the counter value register *STCSR = 5; // enable FCLK count without interrupt

You can read STCVR reg which is a down register before and after function and then remove the overhead cycles (to read STCVR register).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM