简体   繁体   中英

embed a functions assembly code in a struct

I've a rather special question: is it possible in C/++ (both because I am sure the question is the same in both languages) to specify a functions's location? Why? I have a very large list of function pointers, and I want to eliminate them.

(Currently) This looks like that(repeated over lika a million times, stored in the user's RAM):

struct {
    int i;
    void(* funptr)();
} test;

Because I know that in most assembly languages, functions are just "goto" directives, I had the following idea. Is it possible to optimize the above construct so that it looks like that?

struct {
    int i;
    // embed the assembler of the function here
    // so that all the functions
    // instructions are located here
    // like this: mov rax, rbx
    // jmp _start ; just demo code
} test2;

In the end, the thing should look like this in memory: An int holding any value, followed by the function's assembly code, referenced by test2. I should be able to call these functions like that: ((void(*)()) (&pointerToTheStruct + sizeof(int)))();

You might think that I'm insane to optimize the app that way, and I cannot disclose any more details on it's function, but if anyone has some pointers on how solve this problem, I would appreciate it. I do not think that there is a standard way to this, so any hacky way to do this via inline assembler/other crazy things is also appreciated!

The only thing you really have to do is make the compiler aware of the (constant) value of the function pointer you want in the struct. The compiler will then (presumably/hopefully) inline that function call wherever it sees it called through that function pointer:

template<void(*FPtr)()>
struct function_struct {
    int i;
    static constexpr auto funptr = FPtr;
};

void testFunc()
{
    volatile int x = 0;
}

using test = function_struct<testFunc>;

int main()
{
    test::funptr();
}

Demo - no call or jmp after optimization.

It remains unclear what the point of the int i is. Note that the code is not technically "directly after the i " here, but it is even more unclear how you'd expect instances of the struct to look like (is the code in them or is it "static" in a way? I feel there is some misunderstanding here on your part what compilers actually produce...). But consider the ways that compiler inlining can help you and you might find the solution you need. If you're worried about executable size after inlining, tell the compiler and it will compromise between speed and size.

This sounds like a terrible idea for a lot of reasons that probably won't save memory, and will hurt performance by diluting L1I-cache with data and L1D-cache with code. And worse if you ever modify or copy objects: self-modifying code stalls.

But yes, this would be possible in C99/C11 with a flexible array member at the end of the struct, which you cast to a function pointer.

struct int_with_code {
    int i;
    char code[];   // C99 flexible array member.  GNU extension in C++
                   // Store machine code here
                   // you can't get the compiler to do this for you.  Good Luck!
};

void foo(struct int_with_code *p) {
    // explicit C-style cast compiles as both C and C++
    void (*funcp)(void) = ( void (*)(void) ) p->code;
    funcp();
}

Compiler output from clang7.0, on the Godbolt compiler explorer is the same when compiled as either C or C++. This is targeting the x86-64 System V ABI, where the first function arg is passed in RDI.

# this is the code that *uses* such an object, not the code that goes in its code[]
# This proves that it compiles,
#  without showing any way to get compiler-generated code into code[]
foo:                                    # @foo
    add     rdi, 4         # move the pointer 4 bytes forward, to point at code[]
    jmp     rdi                     # TAILCALL

(If you leave out the (void) arg-type declaration in C, the compiler will zero AL first in the x86-64 SysV calling convention, in case its actually a variadic function, because it's passing no FP args in registers.)


You'd have to allocate your objects in memory that was executable (normally not done unless they're const with static storage), eg compile with gcc -zexecstack . Or use a custom mmap/mprotect or VirtualAlloc/VirtualProtect on POSIX or Windows.

Or if your objects are all statically allocated, it might be possible to massage compiler output to turn functions in the .text section into objects by adding an int member right before each one. Maybe with some .section and linker tricks, and maybe a linker script, you could even somehow automate it.

But unless they're all the same length (eg with padding like char code[60] ), that won't form an array you can index, so you'll need some way of referencing all these variable-length object.

There are potentially huge performance downsides if you ever modify an object before calling its function: on x86 you'll get self-modifying-code pipeline nuke for executing code near a just-written memory location.

Or if you copied an object before calling its function: x86 pipeline flush, or on other ISAs you need to manually flush caches to get the I-cache in sync with D-cache (so the newly-written bytes can be executed). But you can't copy such objects because their size isn't stored anywhere . You can't search the machine code for a ret instruction, because a 0xc3 byte might appear somewhere that's not the start of an x86 instruction. Or on any ISA, the function might have multiple ret instructions (tail duplication optimization). Or end with a jmp instead of a ret (tailcall). Storing a size would start to defeat the purpose of saving size, eating up at least an extra byte in each object.

Writing code to an object at runtime, then casting to a function pointer, is undefined behaviour in ISO C and C++. On GNU C/C++, make sure you call __builtin___clear_cache on it to sync caches or whatever else is necessary. Yes, this is needed even on x86 to disable dead-store elimination optimizations: see this test case . On x86 it's just a compile-time thing, no extra asm. It doesn't actually clear any caches.

If you do copy at runtime startup, maybe allocate a big chunk of memory and carve out variable-length chunks of it, while copying. If you malloc each separately, you're wasting memory-management overhead on it.


This idea will not save you memory unless you have about as many functions as you have objects

Normally you have a fairly limited number of actual functions, with many objects having copies of the same function pointer. (You've kind of hand-rolled C++ virtual functions, but with only one function you just have a function pointer directly instead of a vtable pointer to a table of pointers for that class type. One fewer levels of indirection, and apparently you're not passing the object's own address to the function.)

One of the several benefits of this level of indirection is that one pointer is usually significantly smaller than the entire code for a function. For that to not be the case, your functions would have to be tiny .

Example: with 10 different functions of 32 bytes each, and 1000 objects with function pointers, you have a total of 320 bytes of code (which will stay hot in I-cache), and 8000 bytes of function pointers. (And in your objects, another 4 bytes per object wasted on padding to align the pointer, making the total size 16 instead of 12 bytes per object.) Anyway, that's 16320 bytes total for entire structs + code . If you allocated each object separately, there's per-object bookkeeping.

With inlining machine code into each object, and no padding, that's 1000 * (4+32) = 36000 bytes, over twice the total size.

x86-64 is probably a best-case scenario, where a pointer is 8 bytes and x86-64 machine code uses a (famously complex) variable-length instruction encoding which allows for high code density in some cases, especially when optimizing for code-size. (eg code-golfing. https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code ). But unless your functions are mostly something trivial like lea eax, [rdi + rdi*2] (3 bytes=opcode + ModRM + SIB) / ret (1 byte), they're still going to take more than 8 bytes. (That's return x*3; for a function that takes a 32-bit integer x arg, in the x86-64 System V ABI.)

If they're wrappers for larger functions, a normal call rel32 instruction is 5 bytes. A load of static data is at least 6 bytes ( opcode + modrm + rel32 for a RIP-relative addressing mode, or loading EAX specifically can use the special no-modrm encoding for an absolute address. But in x86-64 that's a 64-bit absolute unless you use an address-size prefix too, potentially causing an LCP stall in the decoders on Intel. mov eax, [32 bit absolute address] = addr32 (0x67) + opcode + abs32 = 6 bytes again, so this is worse for no benefit).

Your function-pointer type doesn't have any args (assuming this is C++ where foo() means foo(void) in a declaration, not like old C where an empty arg list is somewhat similar to (...) ). Thus we can assume you're not passing args, so to do anything useful the functions are probably accessing some static data or making another call.


Ideas that make more sense:

  • Use an ILP32 ABI like Linux x32 , where the CPU runs in 64-bit mode but your code uses 32-bit pointers. This would make each of your objects only 8 bytes instead of 16. Avoiding pointer-bloat is a classic use-case for x32 or ILP32 ABIs in general.

    Or (yuck) compile your code as 32-bit. But then you have obsolete 32-bit calling conventions that pass args on the stack instead of registers, and less than half the registers, and much higher overhead for position-independent code. (No EIP/RIP-relative addressing.)

  • Store an unsigned int table index to a table of function pointers. If you have 100 functions but 10k objects, the table is only 100 pointers long. In asm you could index an array of code directly (computed goto style) if all the functions were padded to the same length, but in C++ you can't do that. An extra level of indirection with a table of function pointers is probably your best bet.

eg

void (*const fptrs[])(void) = {
    func1, func2, func3, ...
};

struct int_with_func {
    int i;
    unsigned f;
};

void bar(struct int_with_func *p) {
    fptrs[p->f] ();
}

clang/gcc -O3 output:

 bar(int_with_func*):
    mov     eax, dword ptr [rdi + 4]            # load p->f
    jmp     qword ptr [8*rax + fptrs] # TAILCALL    # index the global table with it for a memory-indirect jmp

If you were compiling a shared library, PIE executable, or not targeting Linux, the compiler couldn't use a 32-bit absolute address to index a static array with one instruction. So there'd be a RIP-relative LEA in there and something like jmp [rcx+rax*8] .

This is an extra level of indirection vs. storing a function pointer in each object, but it lets you shrink each object to 8 bytes, down from 16, like using 32-bit pointers. Or to 5 or 6 bytes, if you use an unsigned short or uint8_t and pack the structs with __attribute__((packed)) in GNU C.

No, not really.

The way to specify a function's location is to use a function pointer, which you're already doing.

You could make different types which have their own different member functions, but then you're back to the original problem.

I have in the past experimented with auto-generating (as a pre-build step, using Python) a function with a long switch statement that does the work of mapping int i to a normal function call. This gets rid of the function pointers, at the expense of branching. I don't remember whether it ended up being worthwhile in my case and, even if I did, that wouldn't tell us whether it's worthwhile in your case.

Because I know that in most assembly languages, functions are just "goto" directives

Well, it's perhaps a little more complicated than that…

You might think that I'm insane to optimize the app that way

Perhaps. Trying to eliminate indirection is not, in itself, a bad thing, so I don't think you're wrong to try to improve this. I just don't think that you necessarily can.

but if anyone has some pointers

lol

I don't understand the goal of this "optimization" is it about saving the memory?

I might be misunderstanding the question, but if you just replace your function pointer with a regular function, then you'll have your struct only containing the int as data and the function-pointer being inserted by the compiler when you take the address of it, instead of stored in memory.

So just do

struct {
    int i;
    void func();
} test;  

Then sizeof(test)==sizeof(int) should hold true if you set alignment/packing to be tight.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM