编写MIPS机器指令并从C执行它们

Question

I'm trying to write some self modifying code in C and MIPS. 我正在尝试用C和MIPS编写一些自我修改的代码。

Since I want to modify the code later on, I'm trying to write actual machine instructions (as opposed to inline assembly) and am trying to execute those instructions. 由于稍后我想修改代码，因此我尝试编写实际的机器指令（而不是内联汇编），并尝试执行这些指令。 Someone told me that it would be possible to just malloc some memory, write the instructions there, point a C function pointer to it and then jump to it. 有人告诉我，可以只分配一些内存，在其中写入指令，将C函数指针指向该内存，然后跳转到该内存。 (I include the example below) （我包括以下示例）

I've tried this with my cross compiler (sourcery codebench toolchain) and it doesn't work (yes, in hind sight I suppose it does seem rather naive). 我已经使用我的交叉编译器（源代码平台工具链）进行了尝试，但是它不起作用（是的，在我看来，我想它看起来确实很幼稚）。 How could I properly do this? 我该怎么做呢？

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>


void inc(){
    int i = 41;
    uint32_t *addone = malloc(sizeof(*addone) * 2); //we malloc space for our asm function
    *(addone) = 0x20820001; // this is addi $v0 $a0 1, which adds one to our arg (gcc calling con)
    *(addone + 1) = 0x23e00000; //this is jr $ra

    int (*f)(int x) = addone; //our function pointer
    i = (*f)(i);
    printf("%d",i);    
}

int main(){
    inc();
exit(0);}

I follow the gcc calling convention here, where the arguments are passed to $a0 and the results of the functions are expected to be in $v0. 我在这里遵循gcc调用约定，将参数传递给$ a0，并且函数的结果应位于$ v0中。 I don't actually know if the return address will be put into $ra (but I can't test it yet since I can't compile. I use int for my instructions because I'm compiling MIPS32(hence a 32 bit int should be enough) 我实际上不知道是否将返回地址放入$ ra（但由于无法编译，因此我目前无法对其进行测试。我将int用于指令，因为我正在编译MIPS32（因此为32位int应该足够）

Answer 1

You are using pointers inappropriately. 您不恰当地使用了指针。 Or, to be more accurate, you aren't using pointers where you should be. 或者，更准确地说，您没有在应该使用的位置使用指针。

Try this on for size: 尝试以下尺寸：

uint32_t *addone = malloc(sizeof(*addone) * 2);
addone[0] = 0x20820001; // addi $v0, $a0, 1
addone[1] = 0x23e00000; // jr $ra

int (*f)(int x) = addone; //our function pointer
i = (*f)(i);
printf("%d\n",i);

You may also need to set the memory as executable after writing to it, but before calling it: 您可能还需要在写入后但调用它之前将内存设置为可执行文件：

mprotect(addone, sizeof(int) * 2, PROT_READ | PROT_EXEC);

To make this work, you may additionally need to allocate a considerably larger block of memory (4k or so) so that the address is page-aligned. 为了使这项工作有效，您可能还需要分配一个更大的内存块（大约4k），以便地址是页面对齐的。

Answer 2

You also need to make sure that the memory in question is executable, and makes sure it gets flushed properly from the dcache after writing it and loaded into the icache before executing it. 您还需要确保所讨论的内存是可执行的，并确保在写入后将其从dcache中正确刷新，并在执行之前将其加载到icache中。 How to do that depends on the OS running on your mips machine. 如何执行此操作取决于mips计算机上运行的操作系统。

On Linux, you would use the mprotect system call to make the memory executable, and the cacheflush system call to do the cache flushing. 在Linux上，您将使用mprotect系统调用来使内存可执行，并使用cacheflush系统调用来进行缓存刷新。

edit 编辑

Example: 例：

#include <unistd.h>
#include <sys/mman.h>
#include <asm/cachecontrol.h>

#define PALIGN(P)  ((char *)((uintptr_t)(P) & (pagesize-1)))
uintptr_t  pagesize;

void inc(){
    int i = 41;
    uint32_t *addone = malloc(sizeof(*addone) * 2); //we malloc space for our asm function
    *(addone) = 0x20820001; // this is addi $v0 $a0 1, which adds one to our arg (gcc calling con)
    *(addone + 1) = 0x23e00000; //this is jr $ra

    pagesize = sysconf(_SC_PAGESIZE);  // only needs to be done once
    mprotect(PALIGN(addone), PALIGN(addone+1)-PALIGN(addone)+pagesize,
             PROT_READ | PROT_WRITE | PROT_EXEC);
    cacheflush(addone, 2*sizeof(*addone), ICACHE|DCACHE);

    int (*f)(int x) = addone; //our function pointer
    i = (*f)(i);
    printf("%d",i);    
}

Note that we make the entire page(s) containing the code both writable and executable. 请注意，我们使包含代码的整个页面都是可写和可执行的。 That's because memory protection works per page, and we want malloc to be able to continue to use the rest of the page(s) for other things. 那是因为内存保护在每个页面上都有效，并且我们希望malloc能够继续将页面的其余部分用于其他用途。 You could instead use valloc or memalign to allocate entire pages, in which case you could make the code read-only executable safely. 您可以改用valloc或memalign来分配整个页面，在这种情况下，可以使代码安全地变为只读可执行文件。

Answer 3

The OP's code as written compiles without errors with Codesourcery mips-linux-gnu-gcc. OP的书面代码使用Codesourcery mips-linux-gnu-gcc编译时没有错误。

As others have mentioned above, self modifying code on MIPS requires the instruction cache to be synchronized with the data cache after the code is written. 正如其他人在上面提到的那样，在MIPS上进行自我修改的代码要求在编写代码后将指令高速缓存与数据高速缓存同步。 The MIPS32R2 version of the MIPS architecture added the SYNCI instruction which is a user mode instruction that does what you need here. MIPS体系结构的MIPS32R2版本添加了SYNCI 指令，这是一种用户模式指令，可满足您在此处的需要。 All modern MIPS CPUs implement MIPS32R2, including SYNCI . 所有现代MIPS CPU都实现MIPS32R2，包括SYNCI 。

Memory protection is an option on MIPS, but most MIPS CPUs are not built with this feature selected, so using the mprotect system call is likely not needed on most real MIPS hardware. 内存保护是MIPS上的一个选项，但是大多数MIPS CPU并不是在选择此功能的情况下构建的，因此在大多数真正的MIPS硬件上可能不需要使用mprotect系统调用。

Note that if you use any optimization besides -O0 the compiler can and does optimize away the stores to *addone and the function call, which breaks your code. 请注意，如果您使用除-O0以外的任何优化，则编译器可以并且确实优化了*addone和函数调用的存储，这会破坏您的代码。 Using the volatile keyword prevents the compiler from doing this. 使用volatile关键字可防止编译器执行此操作。

The following code generates correct MIPS assembly, but I don't have MIPS hardware handy to test it on: 以下代码生成正确的MIPS汇编，但是我没有MIPS硬件可以方便地对其进行测试：

int inc() {
    volatile int i = 41;
    // malloc 8 x sizeof(int) to allocate 32 bytes ie one cache line,
    // also ensuring that the address of function addone is aligned to
    // a cache line.
    volatile int *addone = malloc(sizeof(*addone) * 8);
    *(addone)     = 0x20820001; // this is addi $v0 $a0 1
    *(addone + 1) = 0x23e00000; //this is jr $ra
    // use a SYNCI instruction to flush the data written above from
    // the D cache and to flush any stale data from the I cache
    asm volatile("synci 0(%0)": : "r" (addone));
    volatile int (*f)(int x) = addone; //our function pointer
    int j = (*f)(i);
    return j;
}

int main(){
    int k = 0;
    k = inc();
    printf("%d",k);    
    exit(0);
}

Answer 4

Calling a function is much more complicated than just jumping to an instruction. 调用函数比跳转到指令要复杂得多。

How are arguments passed? 如何传递参数？ Are they stored in registers, or pushed to the call stack? 它们是存储在寄存器中还是被压入调用堆栈？
How is a value returned? 值如何返回？
Where is the return address placed for the return jump? 返回跳转的返回地址在哪里？ If you have a recursive function, $ra doesn't cut it. 如果您具有递归函数，则$ra不会削减它。
Is the caller or the callee responsible for popping the stack frame when the called function completes? 当被调用函数完成时，调用方或被调用方是否负责弹出堆栈框架？

Different calling conventions have different answers to these questions. 对于这些问题，不同的调用约定有不同的答案。 Though I've never tried anything like what you're doing, I would assume you'd have to write your machine code to match a convention, then tell the compiler that your function pointer uses that convention (different compilers have different ways of doing this - gcc does it with function attributes ). 尽管我从未尝试过像您正在做的事情，但是我认为您必须编写机器代码来匹配约定，然后告诉编译器您的函数指针使用该约定（不同的编译器具有不同的处理方式这-gcc使用功能属性来做到这一点）。

编写MIPS机器指令并从C执行它们

问题描述

4 个解决方案

解决方案1
2 2012-10-31 16:57:16

解决方案2
2 2012-10-31 17:00:45

解决方案3
2 已采纳 2012-11-01 02:26:36

解决方案4
0 2012-10-31 16:30:11

编写MIPS机器指令并从C执行它们

问题描述

4 个解决方案

解决方案1 2 2012-10-31 16:57:16

解决方案2 2 2012-10-31 17:00:45

解决方案3 2 已采纳 2012-11-01 02:26:36

解决方案4 0 2012-10-31 16:30:11

解决方案1
2 2012-10-31 16:57:16

解决方案2
2 2012-10-31 17:00:45

解决方案3
2 已采纳 2012-11-01 02:26:36

解决方案4
0 2012-10-31 16:30:11