简体   繁体   English

使用 ldr 从汇编器中的文字池加载变量

[英]Load variable from literal pool in assembler with ldr

I want to load variables from the literal pool.我想从文字池加载变量。 Ihe literal pool is at the end of the asm file.文字池位于 asm 文件的末尾。

literal_pool_label:
.WORD POOL_EVENT_CHANNEL_2_START_REG_ADDR
.WORD POOL_EVENT_CHANNEL_4_START_REG_ADDR

In the code I wrote:在我写的代码中:

adr r12, literal_pool_label
ldr r5, [r12, #0]

ldr r5, [r12, #4]

In a C modul the define of the variable is as follows:在 C 模块中,变量的定义如下:

const uint32_t POOL_EVENT_CHANNEL_2_START_REG_ADDR = 0x4100e030;
const uint32_t POOL_EVENT_CHANNEL_4_START_REG_ADDR = 0x4100e040;

If I wrote in the pool in the following way the value is correct.如果我按照以下方式在池中写入,则该值是正确的。

.WORD 0x4100e030 // POOL_EVENT_CHANNEL_2_START_REG_ADDR
.WORD 0x4100e040 // POOL_EVENT_CHANNEL_4_START_REG_ADDR

What must I do to get the value from the variable with one instruction?我必须怎么做才能通过一条指令从变量中获取值?

ARM does not have a double indirection addressing mode like say the pdp11 or an other I cant think of (msp430?). ARM 没有像 pdp11 或其他我想不到的(msp430?)那样的双间接寻址模式。

Your other question is cortex-m based and perhaps this is why you are trying to do this in ram, you are putting a lot of effort into this without explaining why you need this functionality and if saving one instruction in the project is going to result in some success vs failure.您的另一个问题是基于 cortex-m 的,也许这就是您尝试在 ram 中执行此操作的原因,您为此付出了很多努力而没有解释为什么需要此功能以及是否会在项目中保存一条指令在某些成功与失败中。 If it is a performance thing then there are other ways around that and likely the code, one instruction, isn't going to improve performance in a noticeable way.如果这是一个性能问题,那么还有其他方法可以解决这个问题,并且很可能代码,一条指令,不会以显着的方式提高性能。 (it can make it worse actually, depends). (实际上可能会使情况变得更糟,取决于)。

so所以

ldr r0,hello
ldr r1,world_addr
ldr r2,[r1]
b .

hello: .word 0x12345678
world_addr: .word world_data
.data
world_data: .word 0x87654321

Disassembly of section .text:

00001000 <hello-0x10>:
    1000:   e59f0008    ldr r0, [pc, #8]    ; 1010 <hello>
    1004:   e59f1008    ldr r1, [pc, #8]    ; 1014 <world_addr>
    1008:   e5912000    ldr r2, [r1]
    100c:   eafffffe    b   100c <hello-0x4>

00001010 <hello>:
    1010:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000

00001014 <world_addr>:
    1014:   00002000    andeq   r2, r0, r0

Disassembly of section .data:

00002000 <__data_start>:
    2000:   87654321    strbhi  r4, [r5, -r1, lsr #6]!

It can easily be asked to generate pc-relative addressing within the section.可以很容易地要求在该部分内生成与 pc 相关的寻址。 Outside the section normally you do a pc-relative load of the address then the second level of indirection is to access the item itself.在该部分之外,您通常会对地址进行相对于 pc 的加载,然后第二级间接访问是访问项目本身。

If you try this gnu assembler will complain.如果你尝试这个 gnu 汇编器会抱怨。

ldr r0,hello
ldr r1,world_addr
ldr r2,[r1]
ldr r3,world_data
b .

hello: .word 0x12345678
world_addr: .word world_data
.data
world_data: .word 0x87654321

Now yes it is technically possible because there is a pc-relative addressing mode that if you can reach the variable that way then you can do it in one instruction and it it is a matter of telling the assembler.现在是的,它在技术上是可能的,因为有一种相对于 pc 的寻址模式,如果您可以通过这种方式访问​​变量,那么您可以在一条指令中完成它,这是告诉汇编程序的问题。

    1000:   e59f0008    ldr r0, [pc, #8]    ; 1010 <hello>
    ...
    1010:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000

with a much further reach.范围更广。

.cpu cortex-m4
.thumb
ldr r0,hello
b .
.space 0x20000000
.align
world_data: .word 0x87654321

But the assembler complains.但是组装商抱怨。

Since you are writing assembly language then you as well as I have the arm architectural reference manual open on your screen, you can see that the thumb encoding allows for a 5 bit offset and the thumb2 encoding a 12 bit offset (both signed) best case.由于您正在编写汇编语言,因此您和我都在屏幕上打开了 arm 架构参考手册,您可以看到拇指编码允许 5 位偏移,拇指 2 编码允许 12 位偏移(均带符号)最佳情况.

Specifies the immediate offset added to or subtracted from the value of to form the address.指定从 的值中添加或减去的立即偏移量以形成地址。 Permitted values are multiples of 4 in the range 0-124 for encoding T1, multiples of 4 in the range 0-1020 for encoding T2, any value in the range 0-4095 for encoding T3, and any value in the range 0-255 for encoding T4.允许的值为 0-124 范围内的 4 倍数用于编码 T1,0-1020 范围内的 4 倍数用于编码 T2,0-4095 范围内的任何值用于编码 T3,以及 0-255 范围内的任何值用于编码 T4。 For the offset addressing syntax, <imm> can be omitted, meaning an offset of 0.对于偏移寻址语法, <imm>可以省略,表示偏移量为 0。

A cortex-m code is below 0x20000000 and ram is above 0x20000000 to some limit like 0x40000000. cortex-m 代码低于 0x20000000,而 ram 高于 0x20000000 到某个限制,例如 0x40000000。

That is more than you can reach in a single instruction from flash if you could get the assembler and linker to work together to do it (like they can with the branch instructions for example).如果您可以让汇编器和链接器协同工作来完成它(例如,它们可以使用分支指令),那么这将超过您在闪存中的单个指令所能达到的。

So the ram solution, you tagged gnu so assuming gnu binutils.所以 ram 解决方案,你标记了 gnu,所以假设 gnu binutils。

.cpu cortex-m4
.thumb
ldr r0,hello
b .
.align
hello: .word 0x87654321
.data
.word 0x12345

MEMORY
{
    rom : ORIGIN = 0x00000000, LENGTH = 0x1000
    ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)   } > rom
    .rodata : { *(.rodata*) } > rom
    .bss    : { *(.bss*)    } > ram
    .data : { *(.rodata*) } > ram AT > rom
}

Disassembly of section .text:

00000000 <hello-0x4>:
   0:   4800        ldr r0, [pc, #0]    ; (4 <hello>)
   2:   e7fe        b.n 2 <hello-0x2>

00000004 <hello>:
   4:   87654321

Disassembly of section .data:

20000000 <.data>:
20000000:   00012345

00000000  00 48 fe e7 21 43 65 87  45 23 01 00              |.H..!Ce.E#..|
0000000c

S00A0000736F2E7372656338
S30D000000000048FEE72143658775
S309000000084523010085
S70500000000FA

so with .data we would see something like that and you can see that the .data items are in the flash, then you add labels/variables to the linker script and then use those labels/variables to copy the compile time initialized ram based items to ram before executing the main program (assuming C but in your case you can do it whenever if this is purely an assembly program).所以使用 .data 我们会看到类似的东西,你可以看到 .data 项目在闪存中,然后你将标签/变量添加到链接器脚本,然后使用这些标签/变量复制编译时初始化的基于 ram 的项目在执行主程序之前 ram (假设是 C,但在您的情况下,如果这纯粹是一个汇编程序,您可以随时执行此操作)。

.cpu cortex-m4
.thumb
.thumb_func
fun:
    ldr r0,something
    bx lr
.align
something: .word 0x11223344

MEMORY
{
    rom : ORIGIN = 0x00000000, LENGTH = 0x1000
    ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { so.o(.text*)  } > rom
    .rodata : { *(.rodata*) } > rom
    .bss    : { *(.bss*)    } > ram
    .data : { *(.rodata*) } > ram AT > rom
    .fun  : { fun.o(.text*) } > ram AT > rom
}

Disassembly of section .text:

00000000 <hello-0x4>:
   0:   4800        ldr r0, [pc, #0]    ; (4 <hello>)
   2:   e7fe        b.n 2 <hello-0x2>

00000004 <hello>:
   4:   87654321    strbhi  r4, [r5, -r1, lsr #6]!

Disassembly of section .fun:

20000004 <fun>:
20000004:   46c0        nop         ; (mov r8, r8)
20000006:   4770        bx  lr

(srec)
S00A0000736F2E7372656338
S30D000000000048FEE72143658775
S309000000084523010085
S3090000000CC04670472D
S70500000000FA

And as with .data you can add linker variables and use them to them to copy the function from flash to ram before you execute it, ideally in the bootstrap but if this is a purely asm program with no C then anywhere before you use it.与 .data 一样,您可以添加链接器变量并使用它们将函数从闪存复制到内存,然后再执行,最好是在引导程序中,但如果这是一个没有 C 的纯 asm 程序,那么在您使用它之前的任何地方。

(no this is not a valid cortex-m program just demonstrates the tools) (不,这不是有效的 cortex-m 程序,只是演示了工具)

You would likely want to start with the linker script associated with the C library you are using as that is where the linker script and bootstrap usually live in a canned setup (part of an SDK, toolchain, C library, etc) duplicate the .data and start from there, but understand that you will run into problems as I have solved in a shoot from the hip manner above您可能希望从与您正在使用的 C 库关联的链接器脚本开始,因为链接器脚本和引导程序通常位于固定设置(SDK、工具链、C 库等的一部分)中,复制 .data并从那里开始,但要明白你会遇到问题,因为我已经解决了上面的臀部方式

.text   : { *(.text*)   } > ram

is going to want to take all of the .text including from other sections/files.将要获取所有 .text,包括来自其他部分/文件。 And at least in my case it got messy as to what came first (when you want to control the vector table, etc you may have to do more work rather than simply have it in .text and put the files in order on the command line).至少在我的情况下,首先出现的是什么(当你想控制向量表等时,你可能需要做更多的工作,而不是简单地将它放在 .text 中并将文件按顺序放在命令行上)。 So as with any linker script and bootstrap work, you have to iterate through, building and disassembly until you get it.因此,与任何链接器脚本和引导程序工作一样,您必须迭代、构建和反汇编,直到获得它。

If you reason for this is performance (or perceived performance) then you can as someone mentioned run out of ram but you can run the whole project in ram and that makes life easier if you have room and you should get the best fetch performance, (although a cortex-m7 it might not be best).如果你的原因是性能(或感知性能),那么你可以像有人提到的那样用完 ram 但你可以在 ram 中运行整个项目,如果你有空间,这会让生活更轻松,你应该获得最佳的提取性能,(尽管是 cortex-m7,但它可能不是最好的)。

.cpu cortex-m4
.thumb
ldr r0,hello
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
b .
.align
hello: .word 0x87654321

MEMORY
{
    ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)   } > ram
    .rodata : { *(.rodata*) } > ram
    .data   : { *(.data*)   } > ram
    .bss    : { *(.bss*)    } > ram
}


Disassembly of section .text:

20000000 <hello-0x1c>:
20000000:   4806        ldr r0, [pc, #24]   ; (2000001c <hello>)
20000002:   46c0        nop         ; (mov r8, r8)
20000004:   46c0        nop         ; (mov r8, r8)
20000006:   46c0        nop         ; (mov r8, r8)
20000008:   46c0        nop         ; (mov r8, r8)
2000000a:   46c0        nop         ; (mov r8, r8)
2000000c:   46c0        nop         ; (mov r8, r8)
2000000e:   46c0        nop         ; (mov r8, r8)
20000010:   46c0        nop         ; (mov r8, r8)
20000012:   46c0        nop         ; (mov r8, r8)
20000014:   46c0        nop         ; (mov r8, r8)
20000016:   46c0        nop         ; (mov r8, r8)
20000018:   46c0        nop         ; (mov r8, r8)
2000001a:   e7fe        b.n 2000001a <hello-0x2>

2000001c <hello>:
2000001c:   87654321    strbhi  r4, [r5, -r1, lsr #6]!

A few lines of C code can take the output and generate this几行 C 代码可以获取输出并生成这个

copybase: .word 0x20000000
copysize: .word 0x00000008
copydata:
.word 0x46C04806 @0x20000000
.word 0x46C046C0 @0x20000004
.word 0x46C046C0 @0x20000008
.word 0x46C046C0 @0x2000000C
.word 0x46C046C0 @0x20000010
.word 0x46C046C0 @0x20000014
.word 0xE7FE46C0 @0x20000018
.word 0x87654321 @0x2000001C

and within that same adhoc C program or outside you can then do this:然后在同一个临时 C 程序中或外部,您可以执行以下操作:

.cpu cortex-m4
.thumb
.syntax unified
.globl _start
_start:
.word 0x20001000
.word reset
.thumb_func
reset:
    ldr r0,copybase
    ldr r1,copysize
    ldr r2,=copydata
    .align
copy_loop:
    ldr r3,[r0],#4
    str r3,[r2],#4
    subs r1,#1
    bne copy_loop
    ldr r0,copybase
    orr r0,#1
    bx r0

    copybase: .word 0x20000000
    copysize: .word 0x00000008
    copydata:
    .word 0x46C04806 @0x20000000
    .word 0x46C046C0 @0x20000004
    .word 0x46C046C0 @0x20000008
    .word 0x46C046C0 @0x2000000C
    .word 0x46C046C0 @0x20000010
    .word 0x46C046C0 @0x20000014
    .word 0xE7FE46C0 @0x20000018
    .word 0x87654321 @0x2000001C

and this doesn't necessarily even need a linker script -Ttext=0 should suffice, but if not then这甚至不一定需要链接器脚本 -Ttext=0 就足够了,但如果不需要

MEMORY
{
    rom : ORIGIN = 0x00000000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)   } > rom
}

gnu linker does have a bug with respect to such things so a linker script is cleaner. gnu 链接器确实存在与此类相关的错误,因此链接器脚本更清晰。

In both cases the linker scripts become trivial as well as the bootstrap for C, if you craft it right your bootstrap can be:在这两种情况下,链接器脚本和 C 的引导程序都变得微不足道,如果您制作得当,您的引导程序可以是:

reset:
   bl main
   b .

for the ram based program.对于基于 ram 的程序。

Your fetch performance is generally one clock for sram, where flash is slow and can get worse as you use a faster processor clock speed on many mcus.您的读取性能通常是 sram 的一个时钟,其中闪存很慢,并且随着您在许多 mcus 上使用更快的处理器时钟速度而变得更糟。

And you get your single cycle ldr.你得到你的单周期ldr。

If on an armv6-m and not an armv7-m then that is an easy adjustment...the copy/jump obviously won't work as is.如果在 armv6-m 而不是 armv7-m 上,那么这是一个简单的调整......复制/跳转显然不会按原样工作。

Note that if it was only ldr you were after you could have just done this请注意,如果只是 ldr,您可以这样做

    ldr r0,something
...
something: .word 0x11223344

and both would land in .text and be ideally pc-relative depending on the instruction set and distance.并且两者都将落在 .text 中,并且理想情况下是相对于 pc 的,具体取决于指令集和距离。 None of the above was required.以上都不是必需的。 If you want to read-write that value from somewhere else and have this code simply read it then yes the data needs to be in ram.如果您想从其他地方读写该值并让此代码简单地读取它,那么是的,数据需要在 ram 中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM