简体   繁体   中英

Decode function pointer in C

Is it possible to store a function pointer contents in C. I know you can store every kind of pointer in a variable. But if I can "unwrap" an integer pointer (to an integer) or string pointer (to an unsigned char), wouldn't I be able to decode a function pointer.

To be more clear, I mean to store the machine code instructions in a variable.

You're missing an important fact: A function isn't a (first-class) object in C.

There are two basic types of pointers in C: Data pointers and function pointers. Both can be dereferenced using * .

The similarities end here. A data object has a stored value , so dereferencing a data pointer accesses this value:

int a = 5;
int *b = &a;
int c = *b; // 5

A function is just this, a function . You can call a function, so you can call the result of dereferencing a function pointer. It doesn't have a stored value :

int x(void) { return 1; }
int (*y)(void) = &x; // valid also without the address-of operator

// ...
int main(void)
{
    int a = (*y)();  // valid also without explicit dereference like int a = y();
}

For ease of handling, C allows omitting the & operator when assigning a function to a function pointer and also omitting the explicit dereference when calling a function through a function pointer.

In short: using pointers doesn't change anything about the semantics of data objects vs functions .


Also note in this context that function and data pointers aren't compatible. You can't assign a function pointer to void * . It's even possible to have a platform where a function pointer has a different size from a data pointer.


In practice, on a platform where a function pointer has the same format as a data pointer, you could " convince " your compiler to access the actual binary code located there by casting your pointer to const char * . But be aware this is undefined behavior.

A pointer in C is the address of some object in memory. An int * is the address of an int , a pointer to a function is the address where the code of the function is stored in memory.

While you can read some bytes from the address of a function in memory, they are just bytes and nothing else. You need to know how to interpret these bytes in order to "store the machine code instructions in a variable" . And the real problem here is to know where to stop, where the code of one function ends and the code of another function begins.

These things are not defined by the language and they depend on many factors: the processor architecture, the OS, the compiler, the compiler flags used to compile the code (for optimizations fe).

The real question here is: assuming you can " store the machine code instructions in a variable" how do you want to use it? It is just a sequence of bytes meaningless for most humans and it cannot be used to execute the function. If you are not writing a compiler, linker, emulator, operating system or something similar, there is nothing useful you can do with the machine code instruction of a function. (And if you are writing one of the above then you know the answer and you do not ask such questions on SO or somewhere else.)

The code here should be a skeleton to inject code into a program. But if you execute it in a SO such as Linux or Windows you will get an exception before the execution of the first instruction the fn_ptr points.

#include <stdio.h>
#include <malloc.h>

typedef int FN(void);

int main(void)
{
        FN * fn_ptr;
        char * x;

        fn_ptr = malloc(10240);
        x = (char *)fn_ptr;

        // ... Insert code into x that points the same memory of fn_ptr;
        x[0]='\xeb'; x[1]='\xfe'; // jmp $ that is like while(1)
        fn_ptr();

        return 0;
}

If you execute this code using gdb, you obtain this result:

(gdb) l
2   #include <malloc.h>
3   
4   typedef int FN(void);
5   
6   int main(void)
7   {
8       FN * fn_ptr;
9       char * x;
10  
11      fn_ptr = malloc(10240);
12      x = (char *)fn_ptr;
13  
14      // ... Insert code into x that points the same memory of fn_ptr;
15      x[0]='\xeb'; x[1]='\xfe'; // jmp $ that is like while(1)
16      fn_ptr();
17  
18      return 0;
19  }
(gdb) b 11
Breakpoint 1 at 0x400535: file p.c, line 11.
(gdb) r
Starting program: /home/sergio/a.out 

Breakpoint 1, main () at p.c:11
11      fn_ptr = malloc(10240);
(gdb) p fn_ptr
$1 = (FN *) 0x7fffffffde30
(gdb) n
12      x = (char *)fn_ptr;
(gdb) n
15      x[0]='\xeb'; x[1]='\xfe'; // jmp $ that is like while(1)
(gdb) p x[0]
$3 = 0 '\000'
(gdb) n
16      fn_ptr();
(gdb) p x[0]
$5 = -21 '\353'
(gdb) p x[1]
$6 = -2 '\376'
(gdb) s

Program received signal SIGSEGV, Segmentation fault.
0x0000000000602010 in ?? ()
(gdb) where
#0  0x0000000000602010 in ?? ()
#1  0x0000000000400563 in main () at p.c:16
(gdb) 

How you see the GDB signals a SIGSEGV, Segmentation fault at the address where fn_ptr points, although the instructions we have into the memory are valid instructions.

Note that the LM Code: EB FE is valid for Intel (or compatible) processor only. This LM Code correspond to the Assembly code: jmp $.

Assume we are talking about von Neumann architecture.

Basically we have a single memory which contains both instructions and data. However modern OSes are able to control memory access permissions (read/write/execute).

Standardwise it is undefined behaviour to cast function pointer to data pointer. Although if we are talking say Linux, gcc and modern x86-64 CPU, you may do such a conversion, what you'll get will be a pointer into readonly executable segment of memory.

For instance take a look at this simple program:

#include <stdio.h>

int func() {
  return 1;
}

int main() {
  unsigned char * code = (void*)func;
  printf("%02x\n%02x%02x%02x\n%02x%02x%02x%02x%02x\n%02x\n%02x\n", 
      *code, 
      *(code+1), *(code+2), *(code+3), 
      *(code+4), *(code+5), *(code+6), *(code+7), *(code+8),
      *(code+9),
      *(code+10));
}

Compiled with:

gcc -O0 -o tst tst.c

It's output on my machine is:

55         // push rbp
4889e5     // mov rsp, rbp
b801000000 // mov eax, 0x1
5d         // pop rbp
c3         // ret

Which as you may see is indeed our function.

Since OS provides you with ability to mark memory executable you may in fact write your functions in runtime all you need is to generate current platform opcodes and mark memory executable. Which is exactly how JIT compilers work. For an excellent example of such a compiler take a look at LuaJIT.

This is an example of use of function pointers where the LM code is copied into a memory area and executed.

The program below doesn't do nothing special! It runs the code that is in the array prg[][] copying it into a memory mapped area. It uses two functions pointer fnI_ptr and fnD_ptr both pointing the same memory area. The program copies the LM code in the memory alternatively one of the two code and then executes the "loaded" code.

#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <malloc.h>
#include <sys/mman.h>
#include <stdint.h>
#include <inttypes.h>

typedef int FNi(int,int);
typedef double FNd(double,double);

const char prg[][250] = {
    // int multiply(int x,int y)
    {
    0x55,                       // push     %rbp
    0x48,0x89,0xe5,             // mov      %rsp,%rbp
    0x89,0x7d,0xfc,             // mov      %edi,-0x4(%rbp)
    0x89,0x75,0xf8,             // mov      %esi,-0x8(%rbp)
    0x8B,0x45,0xfc,             // mov      -0x4(%rbp),%eax
    0x0f,0xaf,0x45,0xf8,        // imul     -0x8(%rbp),%eax
    0x5d,                       // pop      %rbp
    0xc3                        // retq
    },

    // double multiply(double x,double y)
    {
    0x55,                       // push     %rbp
    0x48,0x89,0xe5,             // mov    %rsp,%rbp
    0xf2,0x0f,0x11,0x45,0xf8,   // movsd  %xmm0,-0x8(%rbp)
    0xf2,0x0f,0x11,0x4d,0xf0,   // movsd  %xmm1,-0x10(%rbp)
    0xf2,0x0f,0x10,0x45,0xf8,   // movsd  -0x8(%rbp),%xmm0
    0xf2,0x0f,0x59,0x45,0xf0,   // mulsd  -0x10(%rbp),%xmm0
    0xf2,0x0f,0x11,0x45,0xe8,   // movsd  %xmm0,-0x18(%rbp)
    0x48,0x8b,0x45,0xe8,        // mov    -0x18(%rbp),%rax
    0x48,0x89,0x45,0xe8,        // mov    %rax,-0x18(%rbp)
    0xf2,0x0f,0x10,0x45,0xe8,   // movsd  -0x18(%rbp),%xmm0
    0x5d,                       // pop    %rbp
    0xc3                        // retq
    }
};

int main(void)
{
#define FMT "0x%016"PRIX64

    int ret=0;

    FNi * fnI_ptr=NULL;
    FNd * fnD_ptr=NULL;

    void * x=NULL;

    //uint64_t p = PAGE(K), l =  p*4; //Max memory to use!
    uint64_t p = 0, l =  0, line=0; //Max memory to use!

    do {
        p = getpagesize();line = __LINE__;
        if (!p) {
            ret=line;
            break;
        }

        l=p*2;
        printf("Mem page size  = "FMT"\n",p);
        printf("Mem alloc size = "FMT"\n\n",l);

        x = mmap(NULL, l, PROT_EXEC | PROT_READ | PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);line = __LINE__;
        if (x==MAP_FAILED) {
            x=NULL;
            ret=line;
            break;
        }

        //Prepares function-pointers. They point the same memory! :)
        fnI_ptr=(FNi *)x;
        fnD_ptr=(FNd *)x;

        printf("from x="FMT" to "FMT"\n\n",(int64_t)x,(int64_t)x + l);

        // Calling the functions coded into the array prg

        puts("Copying prg[0]");

        // It injects the function prg[0]
        memcpy(x,prg[0],sizeof(prg[0]));

        // It executes the injected code
        printf("executing int-mul = %d\n",fnI_ptr(10,20));

        puts("--------------------------");
        puts("Copying prg[1]");

        // It injects the function prg[1]
        memcpy(x,prg[1],sizeof(prg[1]));

        //Prepares function pointers.

        // It executes the injected code
        printf("executing dbl-mul = %f\n\n",fnD_ptr(12.3,3.21));


    } while(0); // Fake loop to be breaked when an error occurs!

    if (x!=NULL)
        munmap(x,l);

    if (ret) {
        printf("[line"
               "=%d] Error %d - %s\n",ret,errno,strerror(errno));
    }
    return errno;
}

In prg[][] there're two LM functions:

  • The first multplies two integer values and returns an integer value as result

  • The second multiplies two double-precision values and returns a double precision value as result.

I don't discuss about portability. The code into prg[][] was obtained by objdump -S prgname > prgname.s of an object obtained compiling with gcc ( gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4 ) without optimization the following code:

int multiply(int a, int b)
{
    return a*b;
}

double dMultiply(double a, double b)
{
    return a*b;
}

The above code has been compiled on a PC with an Intel I3 CPU (64 bit) and SO Linux (3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM