简体   繁体   English

GCC / g ++ cout <<与printf()

[英]GCC/g++ cout << vs. printf()

  • Why does printf("hello world") ends up using more CPU instructions in the assembled code (not considering the standard library used) than cout << "hello world" ? 为什么printf("hello world")最终在汇编代码中使用了更多的CPU指令(不考虑使用的标准库)而不是cout << "hello world"

For C++ we have: 对于C ++,我们有:

movl    $.LC0, %esi
movl    $_ZSt4cout, %edi
call    _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc

For C: 对于C:

movl    $.LC0, %eax
movq    %rax, %rdi
movl    $0, %eax
call    printf
  • WHAT are line 2 from the C++ code and lines 2,3 from the C code for? 什么是C ++代码的第2行和来自C代码的第2,3行?

I'm using gcc version 4.5.2 我正在使用gcc 4.5.2版

For 64bit gcc -O3 (4.5.0) on Linux x86_64, this reads for: cout << "Hello World" 对于Linux x86_64上的64位gcc -O3(4.5.0),它读取: cout <<“Hello World”

movl    $11, %edx         ; String length in EDX
movl    $.LC0, %esi       ; String pointer in ESI
movl    $_ZSt4cout, %edi  ; load virtual table entry of "cout" for "ostream"
call    _ZSt16__ostream_insertIcSt11char_traits...basic_ostreamIT_T0_ES6_PKS3_l

and, for printf("Hello World") 并且,对于printf(“Hello World”)

movl    $.LC0, %edi       ; String pointer to EDI
xorl    %eax, %eax        ; clear EAX (maybe flag for printf=>no stack arguments)
call    printf

which means, your sequence depends entirely on any specific compiler implementation, its version and probably compiler options. 这意味着,您的序列完全取决于任何特定的编译器实现,其版本和可能的编译器选项。 Your Edit states,you use gcc 4.5.2 (which is fairly new). 您的编辑状态,您使用gcc 4.5.2(这是相当新的)。 Seems like 4.5.2 introduces additional 64bit register fiddling in this sequence for whatever reason. 似乎4.5.2无论出于何种原因,都会在此序列中引入额外的64位寄存器。 It saves the 64bit RAX to RDI before zeroing it out - which makes absolutely no sense (at least for me). 它将64位RAX保存到RDI,然后将其归零 - 这完全没有意义(至少对我而言)。

Much more interesting: 3 Argument call sequence ( g++ -O1 -S source.cpp ): 更有趣的是:3个参数调用序列( g ++ -O1 -S source.cpp ):

 void c_proc()
{
 printf("%s %s %s", "Hello", "World", "!") ;
}

 void cpp_proc()
{
 std::cout << "Hello " << "World " << "!";
}

leads to ( c_proc ): 导致( c_proc ):

movl    $.LC0, %ecx
movl    $.LC1, %edx
movl    $.LC2, %esi
movl    $.LC3, %edi
movl    $0, %eax
call    printf

with .LCx being the strings, and no stack pointer involved ! .LCx是字符串, 不涉及堆栈指针

For cpp_proc : 对于cpp_proc

movl    $6, %edx
movl    $.LC4, %esi
movl    $_ZSt4cout, %edi
call    _ZSt16__ostream_insertIcSt11char_traits...basic_ostreamIT_T0_ES6_PKS3_l
movl    $6, %edx
movl    $.LC5, %esi
movl    $_ZSt4cout, %edi
call    _ZSt16__ostream_insertIcSt11char_traits...basic_ostreamIT_T0_ES6_PKS3_l
movl    $1, %edx
movl    $.LC0, %esi
movl    $_ZSt4cout, %edi
call    _ZSt16__ostream_insertIcSt11char_traits...basic_ostreamIT_T0_ES6_PKS3_l

You see now what this is all about. 你现在看到这是什么。

Regards 问候

rbo RBO

The caller code is most of the time irrelevant to performance. 调用者代码大多数时间与性能无关。

I guess the line 2 of the C++ code stores the address of std::cout as the implicit 'this' argument of the operator<< method. 我猜C ++代码的第2行将std :: cout的地址存储为operator <<方法的隐式'this'参数。

and i might be wrong on the C part, but it seems to me that it is incomplete. 我可能在C部分错了,但在我看来它是不完整的。 the 32bit upper part of rax is not initialized in this snippet, it might be initialized earlier. rax的32位上半部分未在此片段中初始化,可能会在之前初始化。 (no, i'm wrong here). (不,我错在这里)。

from what i understand (i might be wrong), the problem with 64bit registers, is that most of the time they cannot be initialized by immediates, so you have to play with 32bit operations to get the desired result. 根据我的理解(我可能是错的),64位寄存器的问题是,大部分时间它们都无法由immediates初始化,所以你必须使用32位操作来获得所需的结果。 so the compiler plays with 32bit registers to initialize the 64bit rdi register. 所以编译器使用32位寄存器来初始化64位rdi寄存器。

And it seems that printf takes the value of al (the LSB of eax) as an input that tells printf() how many xmm 128 registers are used as input. 并且似乎printf将al(eax的LSB)的值作为输入来告诉printf()有多少xmm 128寄存器用作输入。 It looks like an optimization to be able to pass the input string into the xmm registers or some other funny business. 它看起来像一个优化,能够将输入字符串传递到xmm寄存器或其他一些有趣的业务。

int printf( const char*, ...) is a variadic function that can take one or more arguments; int printf( const char*, ...)是一个可变参数函数,可以接受一个或多个参数; whereas ostream& operator<< (ostream&, signed char*) takes exactly two. ostream& operator<< (ostream&, signed char*)需要两个。 I believe that that accounts for the difference in instructions needed to invoke them. 我认为这说明了调用它们所需的指令差异。

Line 2 in the C++ disassembly is where it passes the ostream& (in this case cout ). C ++反汇编中的第2行是它传递ostream的地方(在这种情况下是cout )。 so the function knows what stream object it is outputting to. 所以函数知道它输出的是什么流对象。

Since both end up making a function call, the comparison is largely irrelevant; 由于最终都进行了函数调用,因此这种比较在很大程度上是无关紧要的; the code executed within the function call will be far more significant. 在函数调用中执行的代码将更加重要。 The operator<< is overloaded for a number of right-hand-side types, and is resolved at compile time; 对于许多右侧类型,运算符<<被重载,并在编译时被解析; printf() on the other hand must parse the format string at runtime to determine the data type so may incur additional overhead. 另一方面,printf()必须在运行时解析格式字符串以确定数据类型,因此可能会产生额外的开销。 Either way the amount of code executed within the functions will swamp the call overhead in terms of instructions executed, and will almost certainly be dominated by the OS code required to render the text on a graphical display. 无论哪种方式,在函数内执行的代码量将根据执行的指令淹没调用开销,并且几乎肯定会受到在图形显示器上呈现文本所需的OS代码的支配。 So in short you are sweating the small stuff . 所以简而言之,你就是在冒汗

movl is move long, 32-bit move movl移动很长,32位移动

movq is move quad, 64-bit move movq是移动四元组,64位移动

printf has a return value, either the number of characters written or -1 on failure, and that value is stored into %eax, that's all the extra line is worrying about. printf有一个返回值,无论是写入的字符数还是失败时的-1,并且该值存储在%eax中,这就是所有额外的行都在担心。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM