简体   繁体   English

每个C指令有多少个asm指令?

[英]How many asm-instructions per C-instruction?

I realize that this question is impossible to answer absolutely, but I'm only after ballpark figures: 我意识到这个问题绝对不可能回答,但我只是在大概数字之后:

Given a reasonably sized C-program (thousands of lines of code), on average, how many ASM-instructions would be generated. 给定一个合理大小的C程序(数千行代码),平均来说,将生成多少ASM指令。 In other words, what's a realistic C-to-ASM instruction ratio? 换句话说,什么是真实的C-to-ASM指令比率? Feel free to make assumptions, such as 'with current x86 architectures'. 随意做出假设,例如“使用当前的x86架构”。

I tried to Google about this, but I couldn't find anything. 我试图谷歌这个,但我找不到任何东西。

Addendum : noticing how much confusion this question brought, I feel some need for an explanation: What I wanted to know by this answer, is to know, in practical terms, what "3GHz" means. 附录 :注意到这个问题引起了多大的混淆,我觉得有些需要解释:我想通过这个答案知道,实际上是要知道“3GHz”是什么意思。 I am fully aware of that the throughput per Herz varies tremendously depending on the architecture, your hardware, caches, bus speeds, and the position of the moon. 我完全清楚每个Herz的吞吐量差别很大,具体取决于架构,硬件,缓存,总线速度和月球位置。

I am not after a precise and scientific answer, but rather an empirical answer that could be put into fathomable scales. 我不是在一个精确而科学的答案之后,而是一个可以用于最终量表的经验答案。

This isn't a trivial answer to place (as I became to notice), and this was my best effort at it. 这不是一个简单的答案(正如我注意到的那样),这是我最大的努力。 I know that the amount of resulting lines of ASM per lines of C varies depending on what you are doing. 我知道每行C的ASM行数量取决于你在做什么。 i++ is not in the same neighborhood as sqrt(23.1) - I know this. i++sqrt(23.1)不在同一个社区 - 我知道这一点。 Additionally, no matter what ASM I get out of the C, the ASM is interpreted into various sets of microcode within the processor, which, again, depends on whether you are running AMD, Intel or something else, and their respective generations. 此外,无论我从C中获得什么ASM,ASM都被解释为处理器内的各种微码集,这再次取决于您是运行AMD,英特尔还是其他东西,以及它们各自的代。 I'm aware of this aswell. 我也知道这一点。

The ballpark answers I've got so far are what I have been after: A project large enough averages at about 2 lines of x86 ASM per 1 line of ANSI-C. 到目前为止,我所得到的球场答案是我所追求的:一个足够大的项目平均每1行ANSI-C大约2行x86 ASM。 Today's processors probably would average at about one ASM command per clock cycle, once the pipelines are filled, and given a sample big enough. 今天的处理器可能会在每个时钟周期平均大约一个ASM命令,一旦管道被填满,并给出足够大的样本。

There is no answer possible. 没有可能的答案。 statements like int a; int a;一样的语句int a; might require zero asm lines. 可能需要零asm线。 while statements like a = call_is_inlined(); while语句如a = call_is_inlined(); might require 20+ asm lines. 可能需要20多个asm行。

You can see yourself by compiling ac program, and then starting objdump -Sd ./a.out . 您可以通过编译ac程序,然后启动objdump -Sd ./a.out来看到自己。 It will display asm and C code intermixed, so you can see how many asm lines are generated for one C line. 它将显示asm和C代码混合,因此您可以看到为一个C行生成了多少asm行。 Example: 例:

test.c test.c的

int get_int(int c);
int main(void) {
    int a = 1, b = 2;
    return getCode(a) + b;
}

$ gcc -c -g test.c $ gcc -c -g test.c

$ objdump -Sd ./test.o $ objdump -Sd ./test.o

00000000 <main>:
int get_int(int c);
int main(void) { /* here, the prologue creates the frame for main */
   0:   8d 4c 24 04             lea    0x4(%esp),%ecx
   4:   83 e4 f0                and    $0xfffffff0,%esp
   7:   ff 71 fc                pushl  -0x4(%ecx)
   a:   55                      push   %ebp
   b:   89 e5                   mov    %esp,%ebp
   d:   51                      push   %ecx
   e:   83 ec 14                sub    $0x14,%esp
    int a = 1, b = 2; /* setting up space for locals */
  11:   c7 45 f4 01 00 00 00    movl   $0x1,-0xc(%ebp)
  18:   c7 45 f8 02 00 00 00    movl   $0x2,-0x8(%ebp)
    return getCode(a) + b;
  1f:   8b 45 f4                mov    -0xc(%ebp),%eax
  22:   89 04 24                mov    %eax,(%esp)
  25:   e8 fc ff ff ff          call   26 <main+0x26>
  2a:   03 45 f8                add    -0x8(%ebp),%eax
} /* the epilogue runs, returning to the previous frame */
  2d:   83 c4 14                add    $0x14,%esp
  30:   59                      pop    %ecx
  31:   5d                      pop    %ebp
  32:   8d 61 fc                lea    -0x4(%ecx),%esp
  35:   c3                      ret

I'm not sure what you mean by "C-instruction", maybe statement or line? 我不确定你的意思是“C指令”,也许是陈述或行? Of course this will vary greatly due to a number of factors but after looking at a few sample programs of my own, many of them are close to the 2-1 mark (2 assembly instructions per LOC), I don't know what this means or how it might be useful. 当然,由于许多因素,这会有很大差异,但在看了我自己的一些示例程序后,其中很多都接近2-1标记(每个LOC 2个汇编指令),我不知道这是什么意味着什么或它如何有用。

You can figure this out yourself for any particular program and implementation combination by asking the compiler to generate only the assembly ( gcc -S for example) or by using a disassembler on an already compiled executable (but you would need the source code to compare it to anyway). 您可以通过要求编译器仅生成程序集(例如gcc -S )或在已编译的可执行文件上使用反汇编程序来自行计算任何特定程序和实现组合(但是您需要使用源代码来比较它无论如何)。

Edit 编辑

Just to expand on this based on your clarification of what you are trying to accomplish (understanding how many lines of code a modern processor can execute in a second): 基于您对要完成的内容的澄清(了解现代处理器可以在一秒内执行多少行代码)来扩展这一点:

While a modern processor may run at 3 billion cycles per second that doesn't mean that it can execute 3 billion instructions per second. 虽然现代处理器可能以每秒30亿个周期运行,但这并不意味着它每秒可以执行30亿个指令。 Here are some things to consider: 这里有一些要考虑的事情:

  • Many instructions take multiple cycles to execute (division or floating point operations can take dozens of cycles to execute). 许多指令需要多个周期才能执行(除法或浮点运算可能需要几十个周期才能执行)。
  • Most programs spend the vast majority of their time waiting for things like memory accesses, disk accesses, etc. 大多数程序花费大部分时间等待内存访问,磁盘访问等。
  • Many other factors including OS overhead (scheduling, system calls, etc.) are also limiting factors. 包括OS开销(调度,系统调用等)在内的许多其他因素也是限制因素。

But in general yes, processors are incredibly fast and can accomplish amazing things in a short period of time. 但总的来说,处理器速度非常快,可以在很短的时间内完成令人惊叹的事情。

That varies tremendously! 这变化很大! I woudn't believe anyone if they tried to offer a rough conversion. 如果他们试图提供粗略的转换,我不相信任何人。

Statements like i++; i++;这样的陈述i++; can translate to a single INC AX . 可以转换为单个INC AX

Statements for function calls containing many parameters can be dozens of instructions as the stack is setup for the call. 在为呼叫设置堆栈时,包含许多参数的函数调用的语句可以是许多指令。

Then add in there the compiler optimization that will assemble your code in a manner different than you wrote it thus eliminating instructions. 然后在那里添加编译器优化,它将以不同于您编写代码的方式组装代码,从而消除指令。

Also some instructions run better on machine word boundaries so NOP s will be peppered throughout your code. 此外,一些指令在机器字边界上运行得更好,因此NOP将在整个代码中使用。

I don't think you can conclude anything useful whatsoever about performance of real applications from what you're trying to do here. 我不认为你可以得出结论: 任何事情从你想在这里做什么实际应用的性能有用的任何 Unless 'not precise' means 'within several orders of magnitude'. 除非“不准确”意味着“在几个数量级内”。

You're just way overgeneralised, and you're dismissing caching, etc, as though it's secondary, whereas it may well be totally dominant. 你只是过度概括,你正在解雇缓存等,好像它是次要的,而它可能完全占主导地位。

If your application is large enough to have trended to some average instructions-per-loc, then it will also be large enough to have I/O or at the very least significant RAM access issues to factor in. 如果您的应用程序足够大,可以趋向于每个位置的某些平均指令,那么它也将足够大,以便具有I / O或至少重要的RAM访问问题。

Depending on your environment you could use the visual studio option : /FAs 根据您的环境,您可以使用visual studio选项:/ FA

more here 更多在这里

I am not sure there is really a useful answer to this. 我不确定这是否真的有用。 For sure you will have to pick the architecture (as you suggested). 当然,你必须选择架构(如你所建议的)。

What I would do: Take a reasonable sized C program. 我会做什么:采取合理规模的C程序。 Give gcc the "-S" option and check yourself. 给gcc“-S”选项并检查自己。 It will generate the assembler source code and you can calculate the ratio for that program yourself. 它将生成汇编程序源代码,您可以自己计算该程序的比率。

RISC or CISC? RISC还是CISC? What's an instruction in C, anyway? 无论如何,C中的指令是什么?

Which is to repeat the above points that you really have no idea until you get very specific about the type of code you're working with. 这是重复以上几点,除非你对你正在使用的代码类型非常具体,否则你真的不知道。

You might try reviewing the academic literature regarding assembly optimization and the hardware/software interference cross-talk that has happened over the last 30-40 years. 您可以尝试查看有关装配优化和过去30 - 40年间发生的硬件/软件干扰串扰的学术文献。 That's where you're going to find some kind of real data about what you're interested in. (Although I warn you, you might wind up seeing C->PDP data instead of C->IA-32 data). 那就是你要找到一些关于你感兴趣的真实数据的地方。(虽然我警告你,你可能会看到C-> PDP数据而不是C-> IA-32数据)。

You wrote in one of the comments that you want to know what 3GHz means. 您在其中一条评论中写道,您想知道3GHz的含义。

Even the frequency of the CPU does not matter. 即使CPU的频率也无关紧要。 Modern PC-CPUs interleave and schedule instructions heavily, they fetch and prefetch, cache memory and instructions and often that cache is invalidated and thrown to the bin. 现代PC-CPU大量交错和调度指令,它们提取和预取,高速缓存存储器和指令,并且通常该高速缓存被无效并抛出到存储区。 The best interpretation of processing power can be gained by running real world performance benchmarks. 通过运行真实世界的性能基准,可以获得对处理能力的最佳解释。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM