简体繁体 English

PC寄存器上的ARM LDR指令

[英]ARM LDR instruction on PC register

原文 2014-06-09 07:49:38 8 2 assembly/ arm/ patch/ monkeypatching

Here how I understand the story: 我在这里理解这个故事：

PC register holds pointer to next instruction PC寄存器保存指向下一条指令的指针
LDR instruction is loading the value of second operand into first operand (for example) LDR指令将第二个操作数的值加载到第一个操作数中（例如）
```
 LDR r0, [pc, 0x5678] LDR r0，[pc，0x5678] 
```
is equivalent to this "C code" 相当于这个“C代码”
```
\nr0 = *(pc + 0x5678) r0 = *（pc + 0x5678）\n
```
It's pointer dereferencing with base offset. 它是使用基本偏移量解引用的指针。

And my question: 我的问题是：

I found this code 我找到了这段代码

LDR PC, [PC,-4]

It's commented like monkey patching, etc.. 它被评论为猴子修补等。

How I understand this code 我如何理解这段代码

pc = *(pc - 4)

I this case "pc" register will dereference the address of previous instruction and will contain the "machine code" of instruction (not the address of instruction), and program will jump to that invalid address to continue execution, and probably we will get "Segmentation Fault". 在这种情况下，“pc”寄存器将取消引用前一条指令的地址，并将包含指令的“机器代码”（不是指令的地址），程序将跳转到该无效地址继续执行，可能我们将“分段故障”。 So what I'm missing or not understanding? 那么我缺少或不理解？

The thing that makes me to think is the brackets of second operand in LDR instruction. 让我思考的是LDR指令中第二个操作数的括号。 As I know on x86 architecture brackets are already dereferencing the pointer, but I can't understand the meaning in ARM architecture. 据我所知，x86架构上的括号已经取消引用指针，但我无法理解ARM架构中的含义。

mov r1, 0x5678
add r1, pc
mov r0, [r1]

is this code equivalent to? 这段代码相当于？

LDR r0, [pc, 0x5678]

2 个解决方案

Quoting from section 4.9.4 of the ARM Instruction Set document (ARM DDI 0029E): 引自ARM指令集文档（ARM DDI 0029E）的4.9.4节：

When using R15 as the base register you must remember it contains an address 8 bytes on from the address of the current instruction. 使用R15作为基址寄存器时，必须记住它包含当前指令地址8字节的地址。

So that instruction will load the word located 4 bytes after the current instruction, which hopefully contains a valid address. 因此该指令将加载位于当前指令之后 4个字节的字，该字有望包含有效地址。

Thanks to a quirk of the ARM architecture , LDR PC, [PC,-4] ~~is a branch to the following instruction (assuming we're talking ARM, not Thumb here), thus under normal circumstances it has no effect (other than performance).~~ 由于ARM体系结构的一个怪癖， LDR PC, [PC,-4] ~~是以下指令的分支（假设我们在谈论ARM，而不是Thumb），因此在正常情况下它没有效果（性能除外））。~~ The point is, by putting that instruction at the start of a function it's then really simple for the code to patch itself at runtime by rewriting the bottom 12 bits of the LDR instruction to change the offset, thus redirecting that function somewhere else. 关键是，通过将该指令放在函数的开头，然后代码在运行时通过重写 LDR指令的底部12位来改变偏移，从而非常简单，从而将该函数重定向到其他地方。 branching to an address stored in memory in the word immediately following the instruction. 分支到紧跟在指令之后的字中存储在存储器中的地址。 Herp derp, I got ADR and LDR confused there - the above would be true if it were ADR , but this case is even more straightforward. Herp derp，我让ADR和LDR混淆了 - 如果它是ADR ，上面的情况就是如此，但这种情况更为直白。

Now that I've unconfused myself it's just a simple function call trampoline. 现在我已经失去了自己，这只是一个简单的函数调用蹦床。 The function address will be stored as a data word immediately following the LDR instruction (presumably set to some initial value by the linker) and can simply be rewritten as data at runtime to redirect the branch, without needing to resort to self-modifying code. 函数地址将紧跟在LDR指令之后存储为数据字（可能由链接器设置为某个初始值），并且可以在运行时简单地重写为数据以重定向分支，而无需求助于自修改代码。