Load variable from literal pool in assembler with ldr

Question

I want to load variables from the literal pool. Ihe literal pool is at the end of the asm file.

literal_pool_label:
.WORD POOL_EVENT_CHANNEL_2_START_REG_ADDR
.WORD POOL_EVENT_CHANNEL_4_START_REG_ADDR

In the code I wrote:

adr r12, literal_pool_label
ldr r5, [r12, #0]

ldr r5, [r12, #4]

In a C modul the define of the variable is as follows:

const uint32_t POOL_EVENT_CHANNEL_2_START_REG_ADDR = 0x4100e030;
const uint32_t POOL_EVENT_CHANNEL_4_START_REG_ADDR = 0x4100e040;

If I wrote in the pool in the following way the value is correct.

.WORD 0x4100e030 // POOL_EVENT_CHANNEL_2_START_REG_ADDR
.WORD 0x4100e040 // POOL_EVENT_CHANNEL_4_START_REG_ADDR

What must I do to get the value from the variable with one instruction?

Answer 1

ARM does not have a double indirection addressing mode like say the pdp11 or an other I cant think of (msp430?).

Your other question is cortex-m based and perhaps this is why you are trying to do this in ram, you are putting a lot of effort into this without explaining why you need this functionality and if saving one instruction in the project is going to result in some success vs failure. If it is a performance thing then there are other ways around that and likely the code, one instruction, isn't going to improve performance in a noticeable way. (it can make it worse actually, depends).

so

ldr r0,hello
ldr r1,world_addr
ldr r2,[r1]
b .

hello: .word 0x12345678
world_addr: .word world_data
.data
world_data: .word 0x87654321

Disassembly of section .text:

00001000 <hello-0x10>:
    1000:   e59f0008    ldr r0, [pc, #8]    ; 1010 <hello>
    1004:   e59f1008    ldr r1, [pc, #8]    ; 1014 <world_addr>
    1008:   e5912000    ldr r2, [r1]
    100c:   eafffffe    b   100c <hello-0x4>

00001010 <hello>:
    1010:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000

00001014 <world_addr>:
    1014:   00002000    andeq   r2, r0, r0

Disassembly of section .data:

00002000 <__data_start>:
    2000:   87654321    strbhi  r4, [r5, -r1, lsr #6]!

It can easily be asked to generate pc-relative addressing within the section. Outside the section normally you do a pc-relative load of the address then the second level of indirection is to access the item itself.

If you try this gnu assembler will complain.

ldr r0,hello
ldr r1,world_addr
ldr r2,[r1]
ldr r3,world_data
b .

hello: .word 0x12345678
world_addr: .word world_data
.data
world_data: .word 0x87654321

Now yes it is technically possible because there is a pc-relative addressing mode that if you can reach the variable that way then you can do it in one instruction and it it is a matter of telling the assembler.

    1000:   e59f0008    ldr r0, [pc, #8]    ; 1010 <hello>
    ...
    1010:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000

with a much further reach.

.cpu cortex-m4
.thumb
ldr r0,hello
b .
.space 0x20000000
.align
world_data: .word 0x87654321

But the assembler complains.

Since you are writing assembly language then you as well as I have the arm architectural reference manual open on your screen, you can see that the thumb encoding allows for a 5 bit offset and the thumb2 encoding a 12 bit offset (both signed) best case.

Specifies the immediate offset added to or subtracted from the value of to form the address. Permitted values are multiples of 4 in the range 0-124 for encoding T1, multiples of 4 in the range 0-1020 for encoding T2, any value in the range 0-4095 for encoding T3, and any value in the range 0-255 for encoding T4. For the offset addressing syntax, <imm> can be omitted, meaning an offset of 0.

A cortex-m code is below 0x20000000 and ram is above 0x20000000 to some limit like 0x40000000.

That is more than you can reach in a single instruction from flash if you could get the assembler and linker to work together to do it (like they can with the branch instructions for example).

So the ram solution, you tagged gnu so assuming gnu binutils.

.cpu cortex-m4
.thumb
ldr r0,hello
b .
.align
hello: .word 0x87654321
.data
.word 0x12345

MEMORY
{
    rom : ORIGIN = 0x00000000, LENGTH = 0x1000
    ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)   } > rom
    .rodata : { *(.rodata*) } > rom
    .bss    : { *(.bss*)    } > ram
    .data : { *(.rodata*) } > ram AT > rom
}

Disassembly of section .text:

00000000 <hello-0x4>:
   0:   4800        ldr r0, [pc, #0]    ; (4 <hello>)
   2:   e7fe        b.n 2 <hello-0x2>

00000004 <hello>:
   4:   87654321

Disassembly of section .data:

20000000 <.data>:
20000000:   00012345

00000000  00 48 fe e7 21 43 65 87  45 23 01 00              |.H..!Ce.E#..|
0000000c

S00A0000736F2E7372656338
S30D000000000048FEE72143658775
S309000000084523010085
S70500000000FA

so with .data we would see something like that and you can see that the .data items are in the flash, then you add labels/variables to the linker script and then use those labels/variables to copy the compile time initialized ram based items to ram before executing the main program (assuming C but in your case you can do it whenever if this is purely an assembly program).

.cpu cortex-m4
.thumb
.thumb_func
fun:
    ldr r0,something
    bx lr
.align
something: .word 0x11223344

MEMORY
{
    rom : ORIGIN = 0x00000000, LENGTH = 0x1000
    ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { so.o(.text*)  } > rom
    .rodata : { *(.rodata*) } > rom
    .bss    : { *(.bss*)    } > ram
    .data : { *(.rodata*) } > ram AT > rom
    .fun  : { fun.o(.text*) } > ram AT > rom
}

Disassembly of section .text:

00000000 <hello-0x4>:
   0:   4800        ldr r0, [pc, #0]    ; (4 <hello>)
   2:   e7fe        b.n 2 <hello-0x2>

00000004 <hello>:
   4:   87654321    strbhi  r4, [r5, -r1, lsr #6]!

Disassembly of section .fun:

20000004 <fun>:
20000004:   46c0        nop         ; (mov r8, r8)
20000006:   4770        bx  lr

(srec)
S00A0000736F2E7372656338
S30D000000000048FEE72143658775
S309000000084523010085
S3090000000CC04670472D
S70500000000FA

And as with .data you can add linker variables and use them to them to copy the function from flash to ram before you execute it, ideally in the bootstrap but if this is a purely asm program with no C then anywhere before you use it.

(no this is not a valid cortex-m program just demonstrates the tools)

You would likely want to start with the linker script associated with the C library you are using as that is where the linker script and bootstrap usually live in a canned setup (part of an SDK, toolchain, C library, etc) duplicate the .data and start from there, but understand that you will run into problems as I have solved in a shoot from the hip manner above

.text   : { *(.text*)   } > ram

is going to want to take all of the .text including from other sections/files. And at least in my case it got messy as to what came first (when you want to control the vector table, etc you may have to do more work rather than simply have it in .text and put the files in order on the command line). So as with any linker script and bootstrap work, you have to iterate through, building and disassembly until you get it.

If you reason for this is performance (or perceived performance) then you can as someone mentioned run out of ram but you can run the whole project in ram and that makes life easier if you have room and you should get the best fetch performance, (although a cortex-m7 it might not be best).

.cpu cortex-m4
.thumb
ldr r0,hello
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
b .
.align
hello: .word 0x87654321

MEMORY
{
    ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)   } > ram
    .rodata : { *(.rodata*) } > ram
    .data   : { *(.data*)   } > ram
    .bss    : { *(.bss*)    } > ram
}


Disassembly of section .text:

20000000 <hello-0x1c>:
20000000:   4806        ldr r0, [pc, #24]   ; (2000001c <hello>)
20000002:   46c0        nop         ; (mov r8, r8)
20000004:   46c0        nop         ; (mov r8, r8)
20000006:   46c0        nop         ; (mov r8, r8)
20000008:   46c0        nop         ; (mov r8, r8)
2000000a:   46c0        nop         ; (mov r8, r8)
2000000c:   46c0        nop         ; (mov r8, r8)
2000000e:   46c0        nop         ; (mov r8, r8)
20000010:   46c0        nop         ; (mov r8, r8)
20000012:   46c0        nop         ; (mov r8, r8)
20000014:   46c0        nop         ; (mov r8, r8)
20000016:   46c0        nop         ; (mov r8, r8)
20000018:   46c0        nop         ; (mov r8, r8)
2000001a:   e7fe        b.n 2000001a <hello-0x2>

2000001c <hello>:
2000001c:   87654321    strbhi  r4, [r5, -r1, lsr #6]!

A few lines of C code can take the output and generate this

copybase: .word 0x20000000
copysize: .word 0x00000008
copydata:
.word 0x46C04806 @0x20000000
.word 0x46C046C0 @0x20000004
.word 0x46C046C0 @0x20000008
.word 0x46C046C0 @0x2000000C
.word 0x46C046C0 @0x20000010
.word 0x46C046C0 @0x20000014
.word 0xE7FE46C0 @0x20000018
.word 0x87654321 @0x2000001C

and within that same adhoc C program or outside you can then do this:

.cpu cortex-m4
.thumb
.syntax unified
.globl _start
_start:
.word 0x20001000
.word reset
.thumb_func
reset:
    ldr r0,copybase
    ldr r1,copysize
    ldr r2,=copydata
    .align
copy_loop:
    ldr r3,[r0],#4
    str r3,[r2],#4
    subs r1,#1
    bne copy_loop
    ldr r0,copybase
    orr r0,#1
    bx r0

    copybase: .word 0x20000000
    copysize: .word 0x00000008
    copydata:
    .word 0x46C04806 @0x20000000
    .word 0x46C046C0 @0x20000004
    .word 0x46C046C0 @0x20000008
    .word 0x46C046C0 @0x2000000C
    .word 0x46C046C0 @0x20000010
    .word 0x46C046C0 @0x20000014
    .word 0xE7FE46C0 @0x20000018
    .word 0x87654321 @0x2000001C

and this doesn't necessarily even need a linker script -Ttext=0 should suffice, but if not then

MEMORY
{
    rom : ORIGIN = 0x00000000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)   } > rom
}

gnu linker does have a bug with respect to such things so a linker script is cleaner.

In both cases the linker scripts become trivial as well as the bootstrap for C, if you craft it right your bootstrap can be:

reset:
   bl main
   b .

for the ram based program.

Your fetch performance is generally one clock for sram, where flash is slow and can get worse as you use a faster processor clock speed on many mcus.

And you get your single cycle ldr.

If on an armv6-m and not an armv7-m then that is an easy adjustment...the copy/jump obviously won't work as is.

Note that if it was only ldr you were after you could have just done this

    ldr r0,something
...
something: .word 0x11223344

and both would land in .text and be ideally pc-relative depending on the instruction set and distance. None of the above was required. If you want to read-write that value from somewhere else and have this code simply read it then yes the data needs to be in ram.

Load variable from literal pool in assembler with ldr

Question

1 answers

solution1
0 2020-09-13 16:29:22

Load variable from literal pool in assembler with ldr

Question

1 answers

solution1 0 2020-09-13 16:29:22

solution1
0 2020-09-13 16:29:22