简体   繁体   中英

Tracing a program from its entry point with ptrace (linux, c)

I want to trace registers & instructions of a program using ptrace. For a better understanding of my code i have reduced it to a point where it just counts the number of instructions of "/bin/ls".

Here is my Code (ignore unnecessary includes):

#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <sys/user.h>
#include <sys/reg.h>    
#include <sys/syscall.h>

int main()
{   
    pid_t child;
    child = fork(); //create child
    
    if(child == 0) {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        char* child_argv[] = {"/bin/ls", NULL};
        execv("/bin/ls", child_argv);
    }
    else {
        int status;
        long long ins_count = 0;
        while(1)
        {
            //stop tracing if child terminated successfully
            wait(&status);
            if(WIFEXITED(status))
                break;

                ins_count++;
                ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
        }

    printf("\n%lld Instructions executed.\n", ins_count);

    }
    
    return 0;
}

When i run this code i get "484252 Instructions executed", which i really doubt. I googled and found out that most of these instructions come from loading libraries before the actual program (/bin/ls) is executed.

How can i skip the single-stepping to the first actual instruction of /bin/ls and count from there?

You're right, your count includes the dynamic linker doing its job (and AFAIK a single phantom instruction just before the binary starts executing).

(I'm using shell commands but it can all be done from C code as well, using elf.h ; see the musl dynamic linker for a nice example)

You could:

  • Parse /bin/ls 's ELF header to find the entrypoint and the program header containing the entrypoint (I'm using cat here as it's easier to keep it running for a long time while I'm writing this)
# readelf -l /bin/cat

Elf file type is EXEC (Executable file)
Entry point 0x4025b0
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
(...)
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x000000000000b36c 0x000000000000b36c  R E    0x200000
(...)

The entrypoint is between VirtAddr and VirtAddr+FileSiz and the flags include the executable bit ( E ) so it looks like we're on the right track.

Note: Elf file type is EXEC (and not DYN ) means that we always map the program headers to the fixed location specified in VirtAddr; this means that for my build of cat we could just use the entry point address we found above. DYN binaries can--and are--loaded at an arbitrary address so we need to do this relocation dance.

  • Find the actual load address of the binary

AFAIK program headers are sorted by VirtAddr, so the first segment with the LOAD flag will be mapped to the lowest address. Open /proc/<pid>/maps and look for your binary:

# grep /bin/cat /proc/7431/maps
00400000-0040c000 r-xp 00000000 08:03 1046541                            /bin/cat
0060b000-0060c000 r--p 0000b000 08:03 1046541                            /bin/cat
0060c000-0060d000 rw-p 0000c000 08:03 1046541                            /bin/cat

The first segment is mapped to 0x00400000 (which is expected from ELF type == EXEC ). If it was not, you'd need to adjust the entrypoint address:

actual_entrypoint_addr = elf_entrypoint_addr - elf_virt_addr_of_first_phdr + actual_addr_of_first_phdr

  • Set a breakpoint on actual_entrypoint_addr and call ptrace(PTRACE_CONT) . Once the breakpoint hits ( waitpid() returns), proceed as you have so far (count the ptrace(PTRACE_SINGLESTEP) s).

An example where we would need to handle the relocation:

# readelf -l /usr/sbin/nginx

Elf file type is DYN (Shared object file)
Entry point 0x24e20
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
(...)
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x000000000010df54 0x000000000010df54  R E    0x200000
(...)

# grep /usr/sbin/nginx /proc/1425/maps
55e299e78000-55e299f86000 r-xp 00000000 08:03 660029                     /usr/sbin/nginx
55e29a186000-55e29a188000 r--p 0010e000 08:03 660029                     /usr/sbin/nginx
55e29a188000-55e29a1a4000 rw-p 00110000 08:03 660029                     /usr/sbin/nginx

The entrypoint is at 0x55e299e78000 - 0 + 0x24e20 == 0x55e299e9ce20

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM