如何找到C程序的“退出”

Question

The test is on 32-bit x86 Linux. 该测试是在32-bit x86 Linux上进行的。

So basically I am trying to log the information of executed basic blocks by insert instrumentation instructions in assembly code. 因此，基本上，我试图通过在汇编代码中插入检测指令来记录已执行的基本块的信息。

My strategy is like this: Write the index of a executed basic block in a globl array, and flush the array from memory to the disk when the array is full (16M). 我的策略是这样的：在globl数组中写入已执行的基本块的索引，并在数组已满（16M）时将其从内存刷新到磁盘。

Here is my problem. 这是我的问题。 I need the flush the array to the disk when the execution of instrumented binary is over, even if it does not reach 16M boundary. 当检测的二进制文件执行结束时，即使它没有达到16M边界，我也需要将数组刷新到磁盘。 However, I just don't know where to find the exit of a assembly program. 但是，我只是不知道在哪里可以找到assembly程序的出口。

I tried this: 我尝试了这个：

grep exit from the target assembly program, and flush the memory right before the call exit instruction. grep exit从目标汇编程序grep exit ，并在call exit指令之前刷新内存。 But according to some debugging experience, the target C program, say, a md5sum binary, does not call exit when it finishes the execution. 但是根据一些调试经验，目标C程序（例如md5sum二进制文件）在完成执行后不会调用exit 。
Flush the memory at the end of main function. 在main功能末尾刷新内存。 However, in the assembly code, I just don't know where is the exact end of main function. 但是，在汇编代码中，我只是不知道main函数的确切结尾在哪里。 I can do a conservative approach, say, looking for all the ret instruction, but it seems to me that not all the main function ends with a ret instruction. 我可以采取保守的方法，例如，查找所有ret指令，但是在我看来，并非所有main功能都以ret指令结尾。

So here is my question, how to identify the exact execution end of a assembly code , and insert some instrumentation instructions there? 所以这是我的问题，如何识别assembly code的确切执行端，并在其中插入一些检测指令？ Hooking some library code is fine to me. 挂钩一些库代码对我来说很好。 I understand with different input, binary could exit at different position, so I guess I need some conservative estimation. 我了解使用不同的输入，二进制可以在不同的位置退出，所以我想我需要一些保守的估计。 Am I clear? 我清楚吗？ thanks! 谢谢！

Answer 1

I believe you cannot do that in the general case. 我相信您通常无法做到这一点。 First, if main is returning some code, it is an exit code (if main has no explicit return the recent C standards require that the compiler adds an implicit return 0; ). 首先，如果main正在返回某些代码，则它是一个退出代码（如果main没有显式return则最新的C标准要求编译器添加隐式 return 0; ）。 Then a function could store the address of exit in some data (eg a global function, a field in a struct , ...), and some other function could indrectly call that thru a function pointer. 然后，一个函数可以将exit地址存储在某些数据中（例如，全局函数， struct的字段等），而其他一些函数可以通过函数指针来间接调用它。 Practically, a program can load some plugins using dlopen and use dlsym for "exit" name, or simply call exit inside the plugin, etc... AFAIU solving that problem (of finding actual exit calls, in the dynamic sense) in full generality can be proved equivalent to the halting problem . 实际上，程序可以使用dlopen加载某些插件，并使用dlsym作为"exit"名称，或者只是在插件内部调用exit ，等等。。。AFAIU完全解决了该问题（从动态意义上寻找实际的exit调用）可以证明等同于停止问题。 See also Rice's theorem . 另请参见赖斯定理。

Without claiming an exhaustive approach, I would suggest something else (assuming you are interested in instrumenting programs coded in C or C++, etc... whose source code is available to you). 在不要求详尽无遗的方法的情况下，我会提出其他建议（假设您对使用C或C ++等编码的程序感兴趣，...您可以使用其源代码）。 You could customize the GCC compiler with MELT to change the basic blocks processed inside GCC to call some of your instrumentation functions. 您可以使用MELT自定义GCC编译器，以更改在GCC中处理的基本块，以调用某些检测函数。 It is not trivial, but it is doable... Of course you'll need to recompile some C code with such a customized GCC to instrument it. 它不是很简单，但是是可行的...当然，您需要使用这种定制的GCC重新编译一些C代码以对其进行检测。

^{(Disclaimer, I am the main author of MELT ; feel free to contact me for more...)} ^{（免责声明，我是MELT的主要作者；请随时与我联系以获取更多信息...）}

BTW, do you know about atexit(3) ? 顺便说一句，您了解atexit（3）吗？ It could be helpful for your flushing issue... And you might also use LD_PRELOAD tricks (read about dynamic linkers , see ld-linux(8) ). 它可能对您的刷新问题有帮助...，并且您还可以使用LD_PRELOAD技巧（有关动态链接器的信息，请参阅ld-linux（8））。

Answer 2

atexit() will properly handle 95+% of programs. atexit()将正确处理95％以上的程序。 You can either modify its chain of registered handlers, or instrument it as you are other blocks. 您可以修改其已注册处理程序链，也可以像其他模块一样对其进行检测。 However, some programs may terminate by use of _exit() which does not invoke atexit handlers. 但是，某些程序可能会通过使用不调用atexit处理程序的_exit()终止。 Probably instrumenting _exit to invoke data flushing and installing an atexit (or on_exit() on BSD-like programs) handler should cover nearly 100% of programs. 可能检测_exit来调用数据刷新并在其类似BSD的程序上安装atexit（或on_exit() ）处理程序应覆盖几乎100％的程序。

Addendum: Note that the Linux Base Specification says that the C library startup shall: 附录：请注意， Linux基本规范 指出 C库启动应：

call the initializer function (*init)(). 调用初始化函数（* init）（）。
call main() with appropriate arguments. 用适当的参数调用main（）。
call exit() with the return value from main(). 使用main（）的返回值调用exit（）。

Answer 3

A method that should be working everytime would be to create a shared memory section for storing your data there. 每次都应该起作用的方法是创建一个共享内存部分，用于在其中存储数据。

You also create a child process which is waiting for the process being debugged to finish. 您还创建了一个子进程，该进程正在等待调试的进程完成。

As soon as the process being debugged has finished the child process will finalize the write operations using the data that is in the shared memory. 待调试的进程完成后，子进程将使用共享内存中的数据来完成写操作。

This should work on all forms of exit, process interruptions (eg Ctrl+C, closing the terminal window, ...) or even if the process has been killed using "kill". 这应该适用于所有形式的退出，过程中断（例如Ctrl + C，关闭终端窗口等），或者即使该过程已使用“ kill”杀死。

Answer 4

But according to some debugging experience, the target C program, say, a md5sum binary, does not call exit when it finishes the execution. 但是根据一些调试经验，目标C程序（例如md5sum二进制文件）在完成执行后不会调用exit。

Let's take a look at a md5sum binary on an i686 GNU/Linux system: 让我们看一下i686 GNU / Linux系统上的md5sum二进制文件：

In the disassembly ( objdump -d /usr/bin/md5sum ) we have this: 在反汇编中（ objdump -d /usr/bin/md5sum ），我们有：

Disassembly of section .text:

08048f50 <.text>:
 8048f50:       55                      push   %ebp
 8048f51:       89 e5                   mov    %esp,%ebp
 8048f53:       57                      push   %edi
 8048f54:       56                      push   %esi
 8048f55:       53                      push   %ebx
 8048f56:       83 e4 f0                and    $0xfffffff0,%esp
 8048f59:       81 ec c0 00 00 00       sub    $0xc0,%esp
 8048f5f:       8b 7d 0c                mov    0xc(%ebp),%edi

[ ... ]

 8049e8f:       68 b0 d6 04 08          push   $0x804d6b0
 8049e94:       68 40 d6 04 08          push   $0x804d640
 8049e99:       51                      push   %ecx
 8049e9a:       56                      push   %esi
 8049e9b:       68 50 8f 04 08          push   $0x8048f50
 8049ea0:       e8 4b ef ff ff          call   8048df0 <__libc_start_main@plt>
 8049ea5:       f4                      hlt

This is all startup boilerplate code. 这是所有启动样板代码。 The actual program's main call is invoked inside the call __libc_start_main . 实际程序的main调用在__libc_start_main调用内调用。 If the program returns from that, then, hey look, there is a hlt instruction. 如果程序从那里返回，那么，看，有一条hlt指令。 That's your target. 那是你的目标。 Look for that hlt instruction and instrument that as the end of the program. 查找该hlt指令并将其作为程序的结尾。

Answer 5

You could try this: 您可以尝试以下方法：

int main() 
bool keepGoing = true;
{
    while(keepGoing) {
        string x;
        cin >> x;
        if(x == "stop") {
            keepGoing = false;
        }
    }
}

even though it is primitive... I probably butchered the coding but it's just a concept. 即使它是原始的……我可能也砍掉了代码，但这只是一个概念。

如何找到C程序的“退出”

问题描述

5 个解决方案

解决方案1
4 2015-07-22 16:19:03

解决方案2
2 2015-07-22 16:44:14

解决方案3
0 2015-07-22 19:06:54

解决方案4
0 2016-07-06 22:53:36

解决方案5
-1 2016-07-06 22:27:43

如何找到C程序的“退出”

问题描述

5 个解决方案

解决方案1 4 2015-07-22 16:19:03

解决方案2 2 2015-07-22 16:44:14

解决方案3 0 2015-07-22 19:06:54

解决方案4 0 2016-07-06 22:53:36

解决方案5 -1 2016-07-06 22:27:43

解决方案1
4 2015-07-22 16:19:03

解决方案2
2 2015-07-22 16:44:14

解决方案3
0 2015-07-22 19:06:54

解决方案4
0 2016-07-06 22:53:36

解决方案5
-1 2016-07-06 22:27:43