简体   繁体   中英

How to find the “exit” of a C program

The test is on 32-bit x86 Linux.

So basically I am trying to log the information of executed basic blocks by insert instrumentation instructions in assembly code.

My strategy is like this: Write the index of a executed basic block in a globl array, and flush the array from memory to the disk when the array is full (16M).

Here is my problem. I need the flush the array to the disk when the execution of instrumented binary is over, even if it does not reach 16M boundary. However, I just don't know where to find the exit of a assembly program.

I tried this:

  1. grep exit from the target assembly program, and flush the memory right before the call exit instruction. But according to some debugging experience, the target C program, say, a md5sum binary, does not call exit when it finishes the execution.

  2. Flush the memory at the end of main function. However, in the assembly code, I just don't know where is the exact end of main function. I can do a conservative approach, say, looking for all the ret instruction, but it seems to me that not all the main function ends with a ret instruction.

So here is my question, how to identify the exact execution end of a assembly code , and insert some instrumentation instructions there? Hooking some library code is fine to me. I understand with different input, binary could exit at different position, so I guess I need some conservative estimation. Am I clear? thanks!

I believe you cannot do that in the general case. First, if main is returning some code, it is an exit code (if main has no explicit return the recent C standards require that the compiler adds an implicit return 0; ). Then a function could store the address of exit in some data (eg a global function, a field in a struct , ...), and some other function could indrectly call that thru a function pointer. Practically, a program can load some plugins using dlopen and use dlsym for "exit" name, or simply call exit inside the plugin, etc... AFAIU solving that problem (of finding actual exit calls, in the dynamic sense) in full generality can be proved equivalent to the halting problem . See also Rice's theorem .

Without claiming an exhaustive approach, I would suggest something else (assuming you are interested in instrumenting programs coded in C or C++, etc... whose source code is available to you). You could customize the GCC compiler with MELT to change the basic blocks processed inside GCC to call some of your instrumentation functions. It is not trivial, but it is doable... Of course you'll need to recompile some C code with such a customized GCC to instrument it.

(Disclaimer, I am the main author of MELT ; feel free to contact me for more...)

BTW, do you know about atexit(3) ? It could be helpful for your flushing issue... And you might also use LD_PRELOAD tricks (read about dynamic linkers , see ld-linux(8) ).

atexit() will properly handle 95+% of programs. You can either modify its chain of registered handlers, or instrument it as you are other blocks. However, some programs may terminate by use of _exit() which does not invoke atexit handlers. Probably instrumenting _exit to invoke data flushing and installing an atexit (or on_exit() on BSD-like programs) handler should cover nearly 100% of programs.


Addendum: Note that the Linux Base Specification says that the C library startup shall:

call the initializer function (*init)().
call main() with appropriate arguments.
call exit() with the return value from main().

A method that should be working everytime would be to create a shared memory section for storing your data there.

You also create a child process which is waiting for the process being debugged to finish.

As soon as the process being debugged has finished the child process will finalize the write operations using the data that is in the shared memory.

This should work on all forms of exit, process interruptions (eg Ctrl+C, closing the terminal window, ...) or even if the process has been killed using "kill".

But according to some debugging experience, the target C program, say, a md5sum binary, does not call exit when it finishes the execution.

Let's take a look at a md5sum binary on an i686 GNU/Linux system:

In the disassembly ( objdump -d /usr/bin/md5sum ) we have this:

Disassembly of section .text:

08048f50 <.text>:
 8048f50:       55                      push   %ebp
 8048f51:       89 e5                   mov    %esp,%ebp
 8048f53:       57                      push   %edi
 8048f54:       56                      push   %esi
 8048f55:       53                      push   %ebx
 8048f56:       83 e4 f0                and    $0xfffffff0,%esp
 8048f59:       81 ec c0 00 00 00       sub    $0xc0,%esp
 8048f5f:       8b 7d 0c                mov    0xc(%ebp),%edi

[ ... ]

 8049e8f:       68 b0 d6 04 08          push   $0x804d6b0
 8049e94:       68 40 d6 04 08          push   $0x804d640
 8049e99:       51                      push   %ecx
 8049e9a:       56                      push   %esi
 8049e9b:       68 50 8f 04 08          push   $0x8048f50
 8049ea0:       e8 4b ef ff ff          call   8048df0 <__libc_start_main@plt>
 8049ea5:       f4                      hlt    

This is all startup boilerplate code. The actual program's main call is invoked inside the call __libc_start_main . If the program returns from that, then, hey look, there is a hlt instruction. That's your target. Look for that hlt instruction and instrument that as the end of the program.

You could try this:

int main() 
bool keepGoing = true;
{
    while(keepGoing) {
        string x;
        cin >> x;
        if(x == "stop") {
            keepGoing = false;
        }
    }
}

even though it is primitive... I probably butchered the coding but it's just a concept.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM