简体   繁体   中英

[]A\A]A^A_ and ;*3$" in compiled C binary

I'm on an Ubuntu 18.04 laptop coding C with VSCode and compiling it with GNU's gcc .

I'm doing some basic engineering on my own C code and I noticed a few interesting details, on of which is the pair []A\A]A^A_ and ;*3$" that seems to appear in every one of my compiled C binaries. Between them is usually (or always) strings that I hard code in for printf() functions.

An example is this short piece of code here:

#include <stdio.h>
#include <stdbool.h>

int f(int i);

int main()
{
    int x = 5;
    int o = f(x);
    printf("The factorial of %d is: %d\n", x, o);
    return 0;
}

int f(int i)
{
    if(i == 0)
    {
        return i;
    }
    else
    {
        return i*f(i-1);
    }

}

... is then compiled using gcc test.c -o test .

When I run strings test , the following is outputted:

/lib64/ld-linux-x86-64.so.2
0HSn(
libc.so.6
printf
__cxa_finalize
__libc_start_main
GLIBC_2.2.5
_ITM_deregisterTMCloneTable
__gmon_start__
_ITM_registerTMCloneTable
AWAVI
AUATL
[]A\A]A^A_
The factorial of %d is: %d
;*3$"
GCC: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
crtstuff.c
deregister_tm_clones
__do_global_dtors_aux
completed.7697
__do_global_dtors_aux_fini_array_entry
frame_dummy
__frame_dummy_init_array_entry
test.c
__FRAME_END__
__init_array_end
_DYNAMIC
__init_array_start
__GNU_EH_FRAME_HDR
_GLOBAL_OFFSET_TABLE_
__libc_csu_fini
_ITM_deregisterTMCloneTable
_edata
printf@@GLIBC_2.2.5
__libc_start_main@@GLIBC_2.2.5
__data_start
__gmon_start__
__dso_handle
_IO_stdin_used
__libc_csu_init
__bss_start
main
__TMC_END__
_ITM_registerTMCloneTable
__cxa_finalize@@GLIBC_2.2.5
.symtab
.strtab
.shstrtab
.interp
.note.ABI-tag
.note.gnu.build-id
.gnu.hash
.dynsym
.dynstr
.gnu.version
.gnu.version_r
.rela.dyn
.rela.plt
.init
.plt.got
.text
.fini
.rodata
.eh_frame_hdr
.eh_frame
.init_array
.fini_array
.dynamic
.data
.bss
.comment

Same as other scripts I've written, the 2 pieces []A\A]A^A_ and ;*3$" always pop up, 1 before the strings used with printf and one right after.

I'm curious: What exactly do those strings mean? I'm guessing they mainly mark the begining and endding of the use of hard-coded output strings.

Our digital computers work on bits, most commonly clustered in bytes containing 8 bits each. The meaning of such a combination depends on the context and the interpretation .

A non-exhausting list of possible interpretation is:

  • ASCII characters with the eighth bit ignored or accepted only if 0;
  • signed or unsigned 8-bit integer;
  • operation code (or part of it) of one specific machine language, each processor (family) has its own different set.

For example, the hex value 0x43 can be seen as:

  1. ASCII character 'C';
  2. Unsigned 8-bit integer 67 (signed is the same if 2's complement is used);
  3. Operation code "LD B,E" for a Z80 CPU (see, I'm really old and learned that processor in depth);
  4. Operation code "EORS ari" for an ARM CPU.

Now strings simply (not to say "primitively") scans through the given file and tries so interpret the bytes as sequences of printable ASCII characters. By default a sequence has to have at least 4 characters and the bytes are interpreted as 7-bit ASCII. BTW, the file does not have to be an executable. You can scan any file but if you give it an object file by default it scans only sections that are loaded in memory.

So what you see are sequences of bytes which by chance are at least 4 printable characters in a row. And because some patterns are always in an executable it just looks as if they have a special meaning. Actually they have but they don't have to relate to your program's strings.

You can use strings to quickly peek into a file to find, well, strings which might help you with whatever you're trying to accomplish.

What you're seeing is an ASCII representation of a particular bit pattern that happens to be common in executable programs generated by that particular compiler. The pattern might correspond to a particular sequence of machine language instructions which the compiler is fond of emitting. Or it might correspond to a particular data structure which the compiler or linker uses to mark the various other pieces of data stored in the executable.

Given enough work, it would probably be possible to work out the actual details, for your C code and your particular version of your particular compiler, precisely what the bit patterns behind []A\A]A^A_ and ;*3$" correspond to. But I don't do much machine-language programming any more, so I'm not going to try, and the answers probably wouldn't be too interesting in the end, anyway.

But it reminds me of little quirk which I have noticed and can explain. Suppose you wrote the very simple program

int i = 12345;

If you compiled that program and ran strings on it, and if you told it to look for strings as short as two characters, you'd probably see (among lots of other short, meaningless strings), the string

90

and that bit pattern would, in fact, correspond to your variable? What's up with that?

Well, 12345 in hexadecimal is 0x3039 , and most machines these days are little-endian, so those two bytes in memory are stored in the other order as

39 30

and in ASCII, 0x39 is '9' , while 0x30 is '0' .

And if this is interesting to you, you can try compiling the program fragment

int i = 12345;

long int a = 1936287860;
long int b = 1629516649;
long int c = 1953719668;

long long int x = 48857072035144;
long long int y = 36715199885175;

and running strings -2 on it, and see what else you get.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM