简体   繁体   中英

Is there a way to compile raw hex code into a binary executable in C?

I have plans for a certain program I want to build and for that I need a way to generate random assembly code and modify it.

I know how to use the system() function (C language) and I wanted to know if there is a way to create a file that contains only a raw hex code and then use system() to compile it in a compiler like NASM into a binary executable.

Note: don't answer because i am about to make another page that answers to my needs... this is too general of a question for me. (sorry for the inconvenience...)

If you want to use NASM for handling the correct binary executable meta data, and format cruft, and you want to produce only the main body of code, you can write to disk new ".asm" file with some header template, like:

           bits    64
global _start
_start:

And then add new lines to that:

    dw      0x1234
    dw      0xc3d5
    ...

Store such complete file under some "temp1234.asm" name, and then compile it with NASM into linux ELF 64b binary (you didn't specify in the question your target platform and CPU, so I'm using what is familiar and most common platform+OS today for example, for other platforms details may differ):

nasm -f elf64 temp1234.asm; ld -b elf64-x86-64 -o temp1234 temp1234.o

(using system() to execute this compilation step) and then you can execute the resulting temp1234 binary with system() too.


If you want the resulting file to contain only your data, then you can use the C size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream) to write byte values directly into opened file (but don't forget to open it with binary file, like FILE *f = fopen("name", "wb"); ), the work-around with producing temporary ASM file above is worth the effort only when you actually want the assembler and linker to produce also the common meta-data of common executables, like ELF64, etc...

To prepare such binary data in C you can do for example:

#include <cstdio>
typedef unsigned short word;
void foo() {
    word payload[3] = { 0x1D35, 0xC3D5, 0xA29F };
    FILE *f = fopen("temp.exe", "wb");
    fwrite(payload, 1, sizeof(payload), f);
    fclose(f);
}

(do NOT run resulting "exe" file created by this, it is not valid EXE binary to be executed, as it is missing header/meta data required by the DOS or Windows EXE variant files .. this is just example how to write binary data into file with C code).


And final note, if you will write pure x86-16 machine opcodes into file named "something.COM", it can be run directly under DOS, as the "COM" executable files format is "raw machine code loaded into single 64k segment of memory starting at offset 0x100", ie writing single byte 0xC3 into "test.com" will execute under DOS correctly (just returning back into DOS, because 0xC3 is ret instruction opcode).

But for most of the other target platforms you will have to produce much more complex executable files containing several meta-data in the properly structured header of the file, to make them valid executables. That's another reason why using assembler+linker is convenient when writing assembly code, not only the translation from text form into machine code, as the assembler+linker when targetting particular executable format will automatically produce all those header/meta data for you.

_start:
    mov $1, %rax # write
    lea .foo, %rsi # text
    mov $6, %rdx # text size
    mov $1, %rdi # stdout
    syscall

    mov $60, %rax #exit
    syscall

 .foo: .ascii "Hello\n"

here's some assembly code (sorry, that AT&T, that's what I use, you asked for machincode anyway).

/tmp> as x.S -o x.o
/tmp> ld x.o -o x
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400078
/tmp> ./x
Hello

So that I know it works…

/tmp> objdump -d x | awk 'BEGIN{ printf " _start: .byte " } /  [0-9a-f]+:/ { i=2; while( $i ~ /^[0-9a-f]{2}$/ ){ printf "0x%s, ", $i; i++ } } END{ print "" }' > y.s
/tmp> cat y.s
 _start: .byte 0x48, 0xc7, 0xc0, 0x01, 0x00, 0x00, 0x00, 0x48, 0x8d, 0x34, 0x25, 0xa0, 0x00, 0x40, 0x00, 0x48, 0xc7, 0xc2, 0x06, 0x00, 0x00, 0x00, 0x48, 0xc7, 0xc7, 0x01, 0x00, 0x00, 0x00, 0x0f, 0x05, 0x48, 0xc7, 0xc0, 0x3c, 0x00, 0x00, 0x00, 0x0f, 0x05, 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x0a, 

That's how I extract machine codes and transform them into assembler readable syntax. And finally:

/tmp> as y.s -o y.o
y.s: Assembler messages:
y.s:1: Warning: zero assumed for missing expression
/tmp> ld y.o -o y
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400078
/tmp> ./y
Hello

Now do it in C. :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM