简体   繁体   English

Hello World 程序 Nasm Assembly 和 C 的执行指令数不同

[英]Number of executed Instructions different for Hello World program Nasm Assembly and C

I have a simple debugger (using ptrace : http://pastebin.com/D0um3bUi ) to count the number of instructions executed for a given input executable program.我有一个简单的调试器(使用 ptrace : http : //pastebin.com/D0um3bUi )来计算为给定的输入可执行程序执行的指令数。 It uses ptrace single step execution mode to count instructions.它使用 ptrace 单步执行模式来计数指令。

For that when the program 1)'s executable (a.out from gcc main.c) is given as input to my test debuggger it prints around 100k as instructions executed.为此,当程序 1) 的可执行文件(来自 gcc main.c 的 a.out)作为输入提供给我的测试调试器时,它会在执行指令时打印大约 100k。 When I use -static option it gives 10681 instructions.当我使用-static选项时,它给出了 10681 条指令。

Now in 2) I create an assembly program and use NASM for compiling and linking and then when this executable is given as test debuggers input it is showing 8 instructions as the count and which is apt.现在在 2) 我创建一个汇编程序并使用 NASM 进行编译和链接,然后当这个可执行文件作为测试调试器输入时,它显示 8 条指令作为计数,这是恰当的。

The number of instructions executed in program 1) is high because of linking the program with system library's at runtime ?由于在运行时将程序与系统库链接,程序 1) 中执行的指令数量很高? used -static and which reduces the count by a factor of 1/10.使用 -static 并将计数减少 1/10。 How can I ensure that the instruction count is only that of the main function in Program 1) and which is how Program 2) is reporting for the debugger?我如何确保指令计数只是程序 1) 中的主要函数的指令计数以及程序 2) 向调试器报告的方式?

1) 1)

#include <stdio.h>

int main()
{
    printf("Hello, world!\n");
    return 0;
}    

I use gcc to create the executable.我使用 gcc 创建可执行文件。

2) 2)

; 64-bit "Hello World!" in Linux NASM

global _start            ; global entry point export for ld

section .text
_start:

    ; sys_write(stdout, message, length)

    mov    rax, 1        ; sys_write
    mov    rdi, 1        ; stdout
    mov    rsi, message    ; message address
    mov    rdx, length    ; message string length
    syscall

    ; sys_exit(return_code)

    mov    rax, 60        ; sys_exit
    mov    rdi, 0        ; return 0 (success)
    syscall

section .data
    message: db 'Hello, world!',0x0A    ; message and newline
    length:    equ    $-message        ; NASM definition pseudo-                             

I build with:我构建:

nasm -f elf64 -o main.o -s main.asm  
ld -o main main.o

The number of instructions executed in program 1) is high because of linking the program with system library's at runtime?由于在运行时将程序与系统库链接,程序 1) 中执行的指令数量很高?

Yep, dynamic linking plus CRT (C runtime) startup files.是的,动态链接加上 CRT(C 运行时)启动文件。

used -static and which reduces the count by a factor of 1/10.使用-static并将计数减少 1/10。

So that just left the CRT start files, which do stuff before calling main , and after.所以只剩下 CRT 启动文件,它在调用main之前和之后做一些事情。

How can I ensure that the instruction count is only that of the main function in Program 1)`如何确保指令计数只是程序 1) 中的主要函数的指令计数

Measure an empty main , then subtract that number from future measurements.测量一个空的main ,然后从未来的测量中减去该数字。

Unless your instruction-counters is smarter, and looks at symbols in the executable for the process it's tracing, it won't be able to tell which code came from where.除非您的指令计数器更智能,并查看可执行文件中它所跟踪的进程的符号,否则它将无法分辨哪些代码来自何处。

and which is how Program 2) is reporting for the debugger.这就是程序 2) 向调试器报告的方式。

That's because there is no other code in that program.这是因为在该程序中没有其他的代码。 It's not that you somehow helped the debugger ignore some instructions, it's that you made a program without any instructions you didn't put there yourself.并不是你以某种方式帮助调试器忽略了一些指令,而是你制作了一个没有任何指令的程序,你没有自己放在那里。

If you want to see what actually happens when you run the gcc output, gdb a.out , b _start , r , and single-step.如果您想查看运行 gcc 输出时实际发生的情况,请使用gdb a.outb _startr和单步执行。 Once you get deep in the call tree, you're prob.一旦你深入调用树,你就是概率。 going to want to use fin to finish execution of the current function, since you don't want to single-step through literally 1 million instructions, or even 10k.想要使用fin来完成当前函数的执行,因为您不想单步执行 100 万条指令,甚至 10k。


related : How do I determine the number of x86 machine instructions executed in a C program?相关如何确定在 C 程序中执行的 x86 机器指令的数量? shows perf stat will count 3 user-space instructions total in a NASM program that does mov eax, 231 / syscall , linked into a static executable.显示perf stat将在执行mov eax, 231 / syscall链接到静态可执行文件的 NASM 程序中计算总共 3 条用户空间指令。

Peter gave a very good answer, and I'm going to followup with a response that is cringe worthy and might garner some down votes.彼得给出了一个很好的答案,我将跟进一个值得畏惧的回应,可能会获得一些反对票。 When linking directly with LD or indirectly with GCC , the default entry point for ELF executables is the label _start .当直接与LD或间接与GCC 链接时ELF可执行文件的默认入口点是标签_start

Your NASM code uses a global label _start so when your program is run the first code in your program will be the instructions of _start .您的NASM代码使用全局标签_start因此当您的程序运行时,程序中的第一个代码将是_start的指令。 When using GCC your program's typical entry point is the function main .使用GCC 时,程序的典型入口点是函数main What is hidden from you is that your C program also has a _start label but it is supplied by the C runtime startup objects.对您隐藏的是您的C程序也有一个_start标签,但它是由C运行时启动对象提供的。

The question now is - is there a way to bypass the C startup files so that the startup code can be avoided?现在的问题是 - 有没有办法绕过C启动文件,从而避免启动代码? Technically yes, but this is perilous territory that could yield undefined behaviour.从技术上讲是的,但这是一个危险的领域,可能会产生未定义的行为。 If you are adventurous you can actually tell GCC to change the entry point of your program with the -e command line option.如果您喜欢冒险,您实际上可以告诉GCC使用-e命令行选项更改程序的入口点。 Rather than _start we could make our entry point main bypassing the C startup code.而不是_start我们可以让我们的入口点main绕过C启动代码。 Since we are bypassing the C startup code we can also dispense with linking in the C runtime startup code with the -nostartfiles option.由于我们绕过了C启动代码,因此我们还可以不用在C运行时启动代码中使用-nostartfiles选项进行链接。

You could use this command line to compile your C program:你可以使用这个命令行来编译你的C程序:

gcc test.c -e main -nostartfiles

Unfortunately, there is a bit of a gotchya that has to be fixed in the C code.不幸的是,在C代码中必须修复一些问题。 Normally when using the C runtime startup objects, after the environment is initialized a CALL is made to main .通常,当使用C运行时启动对象时,在环境初始化后,会对main进行CALL 调用 Normally main does a RET instruction which returns back to the C runtime code.通常main执行一个RET指令,该指令返回到C运行时代码。 At that point the C runtime gracefully exits your program.此时, C运行时会优雅地退出您的程序。 RET doesn't have anywhere to return when the -nostartfiles option is used, so it will likely segfault.当使用-nostartfiles选项时, RET没有任何地方可以返回,因此它可能会出现段错误。 To get around that we can call the C library _exit function to exit our program.为了解决这个问题,我们可以调用C_exit函数来退出我们的程序。

#include <stdio.h>

int main()
{
    printf("Hello, world!\n");
    _exit(0);  /* We exit application here, never reaching the return */

    return 0;
}   

Unless you omit frame pointers there are a few extra instructions emitted by GCC to setup the stack frame and tear it down, but the overhead is minimal.除非您省略帧指针,否则GCC会发出一些额外的指令来设置堆栈帧并将其拆除,但开销很小。

Special Note特别说明

The process above doesn't seem to work for static builds ( -static option in GCC ) with standard glibc C library.上述过程似乎不适用于标准 glibc C库的静态构建( GCC 中的-static选项)。 This is discussed in this Stackoverflow answer .这在此Stackoverflow 答案中进行了讨论。 The dynamic version works because a shared object can register a function that gets called by the dynamic loader to perform initialization.动态版本之所以有效,是因为共享对象可以注册一个函数,该函数被动态加载器调用以执行初始化。 When building statically this is generally done by the C runtime, but we've skipped that initialization.静态构建时,这通常由C运行时完成,但我们跳过了初始化。 Because of that GLIBC functions like printf can fail.因此,像printf这样的GLIBC函数可能会失败。 There are replacement C libraries that are standards compliant that can operate without C runtime initialization.有一些符合标准的替代C库,可以在没有C运行时初始化的情况下运行。 One such product is MUSL .一种这样的产品是MUSL

Installing MUSL as an alternative to GLIBC安装 MUSL 作为 GLIBC 的替代方案

On Ubuntu 64-bit these commands should build and install the 64-bit version of MUSL :在 64 位 Ubuntu 上,这些命令应该构建和安装 64 位版本的MUSL

git clone git://git.musl-libc.org/musl
cd musl
./configure --prefix=/usr/local/musl/x86-64
make
sudo make install

You can then use the MUSL wrapper for GCC to work with the MUSL 's C library instead of the default GLIBC library on most Linux distributions.然后,您可以使用GCCMUSL包装器来处理MUSLC库,而不是大多数 Linux 发行版上的默认GLIBC库。 Parameters are just like GCC so you should be able to do:参数就像GCC,所以你应该能够做到:

/usr/local/musl/x86-64/bin/musl-gcc -e main -static -nostartfiles test.c

When running ./a.out generated with GLIBC it would likely segfault.当运行由./a.out生成的./a.out 时,它可能会出现段错误。 MUSL doesn't need initialization prior to using most of the C library functions, so it should work even with the -static GCC option. MUSL在使用大多数C库函数之前不需要初始化,因此即使使用-static GCC选项它也应该可以工作。


A fairer comparison更公平的比较

One of the issues with your comparison is that you call the SYS_WRITE system call directly in NASM , in C you are using printf .您比较的问题之一是您直接在NASM 中调用SYS_WRITE系统调用,在C 中您使用的是printf User EOF correctly commented that you might want to make it a fairer comparison by calling the write function in C instead of printf .用户 EOF 正确评论说,您可能希望通过调用C 中write函数而不是printf来进行更公平的比较。 write has far less overhead to it. write开销要少得多。 You could amend your code to be:您可以将代码修改为:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main()
{
    char *str = "Hello, world\n";
    write (STDOUT_FILENO, str, 13);
    _exit(0);
    return 0;
}

This will have more overhead than NASM 's direct SYS_WRITE syscall, but far less than what printf would generate.这将比NASM的直接SYS_WRITE系统调用有更多的开销,但远低于printf生成的开销。


I'm going to issue the caveat that such code and trickery would likely not be taken well in a code review except for some fringe cases of software development.我将发出警告,除了一些软件开发的边缘案例外,在代码审查中可能不会很好地采用此类代码和技巧。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM