简体   繁体   English

是否可以找到GCC可以生成的所有汇编指令的列表?

[英]Is it possible to find a list of all of the assembly instructions that GCC can generate?

In the homework for day one of Xeno Kovah's Introduction to x86 Assembly hosted on OpenSecurityTraining , he assigns, 在分配给OpenSecurityTraining的Xeno Kovah的x86大会简介的第一天的作业中,他指出

Instructions we now know(24) 我们现在知道的说明(24)

NOP PUSH/POP CALL/RET MOV/LEA ADD/SUB JMP/Jcc CMP/TEST AND/OR/XOR/NOT SHR/SHL IMUL/DIV REP STOS, REP MOV LEAVE NOP PUSH / POP CALL / RET MOV / LEA ADD / SUB JMP / Jcc CMP / TEST和/或/ XOR / NOT SHR / SHL IMUL / DIV REP STOS,REP MOV LEAVE

Write a program to find an instruction we havenʼt covered, and report the instruction tomorrow. 编写程序以查找我们尚未涵盖的指令,并明天报告指令。

He further predicates the assignment on, 他进一步预测了作业,

  • Instructions to be covered later which donʼt count: SAL / SAR 以后要说明的指示不计算在内: SAL / SAR
  • Variations on jumps or the MUL / IDIV variants of IMUL / DIV also don't count 跳跃的变化或IMUL / DIVMUL / IDIV变体也不计算在内
  • Additional off-limits instructions: anything floating point (since we're not covering those in this class.) 额外的禁止指令:任何浮动点(因为我们没有覆盖此类中的那些。)
  • He says in the video that you can not use inline assembly. 他在视频中说你不能使用内联汇编。 (mentioned when asked). (被问及时提到)。

Rather than objdump ing random executable and auditing them then creating the source, is it possible to find the list of x86 assembly instructions that GCC currently outputs? 而不是objdump荷兰国际集团执行随机和审计他们再创造的源泉, 是有可能找到的是GCC目前输出的x86汇编指令列表?

The foundation for this question seems to be that there is a very small subset of instructions actually used that one needs to know to reverse engineer (which is the focus of the course). 这个问题的基础似乎是实际使用的指令子集很少,需要知道逆向工程(这是课程的重点)。 Xeno seems to be trying to find a fun instructive way to make that point, Xeno似乎试图找到一种有趣的指导方式来表达这一点,

I think that knowing about 20-30 (not counting variations) is good enough that you will have the check the manual very infrequently 我认为知道大约20-30(不计算变化)是足够好的,你将很少检查手册

While I welcome everyone to join me in this awesome class at OpenSecurityTraining, the question is about my proposed method of figuring it out from GCC (if possible). 虽然我欢迎大家和我一起参加OpenSecurityTraining这个很棒的课程,但问题是关于我提出的从GCC中找出它的方法(如果可能的话)。 Not, for people to actually do Xeno's assignment. 不是,人们实际上做了Xeno的任务。 ;) ;)

The foundation for this question seems to be that there is a very small subset of instructions actually used that one needs to know to reverse engineer 这个问题的基础似乎是实际使用的指令子集非常少,需要知道逆向工程

Yes, that's generally true. 是的,这通常是正确的。 There are some instructions gcc will never emit, like enter (because it's much slower than push rbp / mov rbp, rsp / sub rsp, some_constant on modern CPUs). 有一些指令GCC将绝不会发出, 就像enter (因为它比慢得多 push rbp / mov rbp, rsp / sub rsp, some_constant在现代的CPU)。

Other old / obscure stuff like xlat and loop will also be unused because they aren't faster, and gcc's -Os doesn't go all-out optimizing for size without caring about performance. 其他旧/模糊的东西,如xlatloop也将被闲置,因为它们并不快,而gcc的-Os并没有全力以赴地优化尺寸而不关心性能。 ( clang -Oz is more aggressive, but IDK if anyone's bothered to teach it about the loop instruction.) clang -Oz更具侵略性,但IDK如果有人clang -Oz教它loop指令。)

And of course gcc will never emit privileged instructions like wrmsr . 当然,gcc永远不会发出像wrmsr这样的特权指令。 There are intrinsics ( __builtin_... functions) for some unprivileged instructions like rdtsc or cpuid which aren't "normal". 有些内在函数( __builtin_...函数)用于某些非特权指令, rdtsccpuid ,它们不是“正常”。


is it possible to find the list of x86 assembly instructions that GCC currently outputs? 是否可以找到GCC当前输出的x86汇编指令列表?

This would be the gcc machine-definition files. 这将是gcc机器定义文件。 GCC as a portable compiler has it's own text-based language for machine-definition files which describe the instruction-set to the compiler. 作为可移植编译器的GCC具有自己的基于文本的语言,用于描述编译器指令集的机器定义文件。 (What each instruction does, what addressing modes it can use, and some kind of "cost" the optimizer can minimize.) (每个指令的作用,它可以使用的寻址模式,以及优化器可以最小化的某种“成本”。)

See the gcc-internals documentation for them . 请参阅gcc-internals文档


The other approach to this question would be to look at an x86 instruction reference manual (eg this HTML extract , and see other links in the tag wiki) and look for ones you haven't seen yet. 这个问题的另一种方法是查看x86指令参考手册(例如这个HTML提取 ,并查看标签wiki中的其他链接)并查找尚未看到的那些。 Then write a function where gcc would find it useful. 然后编写一个函数,gcc会发现它很有用。

eg if you haven't seen movsx (sign extension) yet, then write 例如,如果你还没有看过movsx (符号扩展名),那就写吧

long long foo(int x) { return x; }

and gcc -O3 will emit ( from the Godbolt compiler explorer ) 和gcc -O3将发出( 来自Godbolt编译器资源管理器

    movsx   rax, edi
    ret

Or to get cdqe (aka cltq in AT&T syntax) for sign-extension within rax , force gcc to do math before sign extending, so it can produce the result in eax first (with a copy-and-add lea ). 或者为了获得rax符号扩展的cdqe (在AT&T语法中也称为cltq ,强制gcc在符号扩展之前进行数学运算,因此它可以首先在eax生成结果(使用复制和添加lea )。

long long bar(unsigned x) { return (int)(x+1); }

    lea     eax, [rdi+1]
    cdqe
    ret

   # clang chooses inc edi  /  movsxd rax, edi

See also Matt Godbolt's CppCon2017 talk: “What Has My Compiler Done for Me Lately? 另见Matt Godbolt的CppCon2017演讲: “我的编译器最近为我做了什么? Unbolting the Compiler's Lid” , and How to remove "noise" from GCC/clang assembly output? 解开编译器的盖子“ ,以及如何从GCC / clang组件输出中消除”噪音“? .


Getting gcc to emit rotate instructions is interesting. 让gcc发出旋转指令很有意思。 Best practices for circular shift (rotate) operations in C++ . C ++中循环移位(旋转)操作的最佳实践 You write it as shifts/OR that gcc can recognize as a rotate. 你把它写成移位/ OR,gcc可以识别为旋转。

Because C doesn't provide standard functions for lots of things modern CPUs can do (rotate, popcnt, count leading / trailing zeros), the only portable thing is to write an equivalent function and have the compiler to recognize that pattern. 因为C不提供现代CPU可以做的许多事情的标准函数(旋转,弹出,计数前导/尾随零),唯一可移植的东西是编写一个等效函数并让编译器识别该模式。 gcc and clang can optimize a whole loop into a single popcnt instruction when compiling with -mpopcnt (enabled by -march=haswell , for example), if you're lucky. 如果你很幸运,gcc和clang可以在使用-mpopcnt进行编译时将整个循环优化为单个popcnt指令(例如,由-march=haswell启用)。 If not, you get a stupid slow loop. 如果没有,你会得到一个愚蠢的慢循环。 The reliable non-portable way is to use __builtin_popcount() , which compiles to a popcnt instruction if the target supports it, otherwise a table lookup. 可靠的非可移植方式是使用__builtin_popcount() ,如果目标支持,则编译为popcnt指令,否则进行表查找。 _mm_popcnt_u64 is popcnt or nothing: it doesn't compile if the target doesn't support the instruction. _mm_popcnt_u64popcnt或什么都没有:如果目标不支持该指令,它不会编译。


Of course the catch 22 flaw with this approach is that it only works if you already know the x86 instruction set and when any given instruction is the right choice for an optimizing compiler! 当然,这种方法的缺陷是它只有在您已经知道x86指令集并且任何给定指令是优化编译器的正确选择时才有效!

(And what gcc chooses to do, eg inline string compares to rep cmpsb in some cases for short strings, although I'm not sure this is optimal. Only rep movs / rep stos have "fast strings" support on modern CPUs. But I don't think gcc will ever use lods , or any of the "string" instructions with a rep prefix.) (以及gcc选择做什么,例如内联字符串在某些情况下对于短字符串比较rep cmpsb ,虽然我不确定这是最优的。只有rep movs / rep stos rep movs在现代CPU上有“快速字符串”支持。但是我不要以为gcc会使用lods ,或任何带有rep前缀的“字符串”指令。)

Rather than objdumping random executable and auditing them then creating the source, is it possible to find the list of x86 assembly instructions that GCC currently outputs? 是否可以找到GCC当前输出的x86汇编指令列表,而不是随机执行随机可执行文件并审核它们然后创建源代码

You can look at the machine description files that gcc uses. 您可以查看gcc使用的机器描述文件 In its source tree, look under gcc/config/i386 and have a look at the .md files. 在其源代码树中,查看gcc / config / i386并查看.md文件。 The core one for x86 is i386.md ; i86的核心是i386.md ; there are others for the various extensions to x86 (and possibly containing heuristics tunings to use when optimizing for different processors). 还有其他用于x86的各种扩展(并且可能包含在针对不同处理器进行优化时使用的启发式调整)。

Be warned: it's definitely not an easy read. 警告:这绝对不是一个容易阅读。

I think that knowing about 20-30 (not counting variations) is good enough that you will have the check the manual very infrequently 我认为知道大约20-30(不计算变化)是足够好的,你将很少检查手册

It's quite true; 这是真的; in my experience doing reverse engineering, 99% of code is always the same stuff, instruction-wise; 根据我进行逆向工程的经验,99%的代码总是相同的东西,指令方面; what is more useful than knowing the entire x86 instruction set is to get familiar with the assembly idioms, especially those frequently emitted by compilers. 比了解整个x86指令集更有用的是熟悉程序集习语,尤其是编译器经常发出的习惯用法。


That being said, from the top of my mind some very common instructions missing (emitted quite often and without enabling extended instruction sets) are: 话虽如此,从我的脑海中,一些非常常见的指令缺失(经常发出并且没有启用扩展指令集)是:

  • movzx / movsx movzx / movsx
  • inc / dec (rare with gcc, common with VC++ ) inc / dec (gcc很少见, 与VC ++相同
  • neg
  • cdq ( before idiv ) cdqidiv之前
  • jcxz / jecxz (rare with gcc, somewhat common with VC++) jcxz / jecxz (gcc很少见,VC ++有些常见)
  • setCC
  • cmpxchg (in synchronization code); cmpxchg (同步代码中);
  • cmovCC
  • adc (when doing 64 bit arithmetic in 32 bit code) adc (在32位代码中进行64位运算时)
  • int3 (often emitted on function boundaries and in general as a filler) int3 (通常在函数边界上发出,通常作为填充int3
  • some other string instructions ( scas / cmps ), especially as canned sequences on older compilers 一些其他字符串指令( scas / cmps ),尤其是旧编译器上的固定序列

And then there's the whole world of SSE & co... 然后就是整个世界的SSE&co ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM