简体   繁体   English

关于AT&T x86语法设计的问题

[英]Questions about AT&T x86 Syntax design

  1. Can anyone explain to me why every constant in AT&T syntax has a '$' in front of it? 任何人都可以向我解释为什么AT&T语法中的每个常量前面都有一个“$”?
  2. Why do all registers have a '%'? 为什么所有寄存器都有'%'?
  3. Is this just another attempt to get me to do a lot of lame typing? 这只是让我做很多蹩脚打字的另一种尝试吗?
  4. Also, am I the only one that finds: 16(%esp) really counterintuitive compared to [esp+16] ? 另外,我是唯一一个发现: 16(%esp)[esp+16]相比真的违反直觉吗?
  5. I know it compiles to the same thing but why would anyone want to type a lot of '$' and '%'s without a need to? 我知道它编译成同样的东西,但为什么有人想要输入很多'$'和'%'而不需要? - Why did GNU choose this syntax as the default? - 为什么GNU选择此语法作为默认语法?
  6. Another thing, why is every instruction in at&t syntax preceded by an: l? 另一件事,为什么at&t语法中的每条指令前面都有一个:l? - I do know its for the operand sizes, however why not just let the assembler figure that out? - 我知道它的操作数大小,但为什么不让汇编程序弄清楚呢? (would I ever want to do a movl on operands that are not that size?) (我是否想要在不是那么大的操作数上做一个movl?)
  7. Last thing: why are the mov arguments inverted? 最后一件事:为什么mov参数倒置了?

Isn't it more logical that: 是不是合乎逻辑:

eax = 5
mov eax, 5

where as at&t is: at at&t是:

mov 5, eax
5 = a (? wait what ?)

Note: I'm not trying to troll. 注意:我不是想乱跑。 I just don't understand the design choices they made and I'm trying to get to know why they did what they did. 我只是不明白他们所做的设计选择,我试图了解他们为什么做了他们所做的。

1, 2, 3 and 5: the notation is somewhat redundant, but I find it to be a good thing when developing in assembly. 1,2,3和5:符号有点多余,但我发现在汇编时发展是一件好事。 Redundancy helps reading. 冗余有助于阅读。 The point about "let the assembler figure it out" easily turns into "let the programmer who reads the code figure it out", and I do not like it when I am the one doing the reading. 关于“让汇编程序弄明白”的观点很容易变成“让读取代码的程序员把它弄清楚”,而当我是那个读书的人时,我不喜欢它。 Programming is not a write-only task; 编程不是只写任务; even the programmer himself must read his own code, and the syntax redundancy helps quite a bit. 甚至程序员自己也必须阅读自己的代码,语法冗余有很大帮助。

Another point is that the '%' and '$' mean that new registers can be added without breaking backward compatibility: no problem in adding, eg, a register called xmm4 , as it will be written out as %xmm4 , which cannot be confused with a variable called xmm4 which would be written without a '%'. 另一点是'%'和'$'意味着可以在不破坏向后兼容性的情况下添加新寄存器:添加时没有问题,例如,一个名为xmm4的寄存器,因为它将写为%xmm4 ,不能混淆使用一个名为xmm4的变量,它将被写成没有'%'。

As for the amount of typing: normally, when programming in assembly, the bottleneck is the brain, not the hand. 至于打字的数量:通常,在组装编程时,瓶颈是大脑,而不是手。 If the '$' and '%' slow you down, then either you are thinking way faster than what is usually considered as doable for a human being, or, more probably, your task at hand is too mechanical and should not be done in assembly; 如果'$'和'%'减慢你的速度,那么要么你想要的速度比人们通常认为的那样快,或者更可能的是,你手头的任务太机械了,不应该用部件; it should be left to an automatic code generator, something colloquially known as a "C compiler". 它应该留给自动代码生成器,通俗地称为“C编译器”。

The 'l' suffix was added to handle some situations where the assembler "cannot" figure it out. 添加了'l'后缀来处理汇编程序“无法”弄明白的情况。 For instance, this code: 例如,这段代码:

mov  [esp], 10

is ambiguous, because it does not tell whether you want to write a byte of value 10, or a 32-bit word with the same numerical value. 是不明确的,因为它不会告诉你是要写一个值为10的字节,还是一个具有相同数值的32位字。 The Intel syntax then calls for: 然后,英特尔语法要求:

mov  byte ptr [esp], 10

which is quite ugly, when you think about it. 当你想到它时,这是非常难看的。 The people at AT&T wanted to make something more rational, so they came up with: AT&T的员工想要做出更合理的事情,所以他们想出了:

movb   $10, (%esp)

and they preferred to be systematic, and have the 'b' (or 'l' or 'w') suffix everywhere . 他们喜欢系统化,并且到处都有'b'(或'l'或'w')后缀。 Note that the suffix is not always required . 请注意,并不总是需要后缀。 For instance, you can write: 例如,你可以写:

mov   %al, (%ebx)

and let the GNU assembler "figure out" that since you are talking about '%al', the move is for a single byte. 让GNU汇编程序“弄清楚”,因为你在谈论'%al',所以移动是针对单个字节的。 It really works ! 真的行 ! Yet, I still find it better to specify the size (it really helps the reader, and the programmer himself is the first and foremost reader of his own code). 然而,我仍然发现更好地指定大小(它确实有助于读者,程序员本人是他自己的代码的第一个也是最重要的读者)。

For the "inversion": it is the other way round. 对于“反转”:反之亦然。 The Intel syntax mimics what occurs in C, in which values are computed on the right, then written to what is on the left. 英特尔语法模仿C中发生的事情,其中​​值在右侧计算,然后写入左侧的值。 Thus, the writing goes right to left, in the "reverse" direction, considering that reading goes left-to-right. 因此,考虑到从左到右的阅读,写作在“反向”方向上从右到左。 The AT&T syntax reverts to the "normal" direction. AT&T语法恢复到“正常”方向。 At least so they considered; 至少他们考虑过这样; since they were decided about using their own syntax anyway, they thought that they could use the operands in what they thought of as "the right ordering". 因为无论如何他们决定使用他们自己的语法,他们认为他们可以在他们认为的“正确的顺序”中使用操作数。 This is mostly a convention, but not an illogical one. 这主要是一个惯例,但不是一个不合逻辑的惯例。 The C convention mimics mathematical notation, except that mathematics are about defining values ("let x be the value 5") and not about assigning values ("we write the value 5 into a slot called 'x'"). C约定模仿数学符号,除了数学是关于定义值(“让x是值5”)而不是关于赋值 (“我们将值5 写入称为'x'的槽”)。 The AT&T choice makes sense. AT&T的选择很有意义。 It is confusing only when you are converting C code to assembly, a task which should usually be left to a C compiler. 只有在将C代码转换为汇编时才会感到困惑,这个任务通常应留给C编译器。

The last part of your question 5 is interesting, from an historical point of view. 从历史的角度来看,问题5的最后一部分很有意思。 The GNU tools for x86 followed the AT&T syntax because at that time, they were trying to take hold in the Unix world ("GNU" means "GNU is Not Unix") and competing with the Unix tools; 用于x86的GNU工具遵循AT&T语法,因为当时他们试图在Unix世界中占据一席之地(“GNU”意味着“GNU不是Unix”)并与Unix工具竞争; Unix was under control of AT&T. Unix在AT&T的控制之下。 This is before the days of Linux or even Windows 3.0; 这是在Linux甚至Windows 3.0之前; PC were 16-bit systems. PC是16位系统。 Unix used the AT&T syntax, hence GNU used AT&T syntax. Unix使用AT&T语法,因此GNU使用AT&T语法。

The good question is then: why did AT&T found it smart to invent their own syntax ? 那么好的问题是:为什么AT&T发现自己发明自己的语法很聪明? As described above, they had some reasons, which were not without merit. 如上所述,他们有一些原因,这些原因并非没有价值。 The cost of using your own syntax, of course, is that it limits interoperability. 当然,使用自己的语法的成本是它限制了互操作性。 In those days, a C compiler or assembler made no real sense as a separate tool: in a Unix system, they were meant to be provided by the OS vendor. 在那些日子里,C编译器或汇编器作为一个单独的工具没有任何意义:在Unix系统中,它们应该由OS供应商提供。 Also, Intel was not a big player in the Unix world; 此外,英特尔在Unix世界中并不是一个重要的参与者; big systems mostly used VAX or Motorola 680x0 derivatives. 大系统主要使用VAX或Motorola 680x0衍生产品。 Nobody had figured out that the MS-Dos PC would turn into, twenty years later, the dominant architecture in the desktop and server worlds. 20年后,没有人发现MS-Dos PC将成为台式机和服务器领域的主导架构。

1-2, 5: They probably chose to prefix registers and such to make it easier to parse; 1-2,5:他们可能选择为寄存器加前缀,以便更容易解析; you know directly at the first character what kind of token it is. 你直接知道第一个字符是什么样的标记。

4: No. 4:不。

6: Again, probably to make it easier for the parser to figure out what instruction to output. 6:同样,可能是为了让解析器更容易找出要输出的指令。

7: Actually this makes more sense in a grammatical meaning, move what to where . 7:其实这更有意义的语法含义,移动什么地方 Perhaps the mov instruction should be an ld instruction. 也许mov指令应该是ld指令。

Don't get me wrong, I think AT&T syntax is horrible. 不要误会我的意思,我认为AT&T的语法很糟糕。

The GNU assembler's AT&T syntax traces its origins to the Unix assembler 1 , which itself took its input syntax mostly from the PDP-11 PAL-11 assembler (ca. 1970). GNU汇编程序的AT&T语法可以追溯到Unix汇编程序1 ,它本身的输入语法主要来自PDP-11 PAL-11汇编程序(约1970年)。

Can anyone explain to me why every constant in AT&T syntax has a '$' in front of it? 任何人都可以向我解释为什么AT&T语法中的每个常量前面都有一个“$”?

It allows to distinguish immediate constants from memory addresses. 它允许区分立即常量和内存地址。 Intel syntax does it the other way around, with memory references as [foo] . 英特尔语法以相反的方式实现,内存引用为[foo]

Incidentally, MASM (the Microsoft Assembler) doesn't need a distinction at the syntax level, since it can tell whether the operand is a symbolic constant, or a label. 顺便说一下,MASM(Microsoft汇编程序)不需要在语法级别上进行区分,因为它可以判断操作数是符号常量还是标签。 Other assemblers for x86 actively avoid such guesses, since they can be confusing to readers, eg: TASM in IDEAL mode (it warns on memory references not in brackets), nasm, fasm. x86的其他汇编程序主动避免这样的猜测,因为它们可能会让读者感到困惑,例如:IDEAL模式下的TASM(它在内存引用上发出警告而不是括号内),nasm,fasm。

PAL-11 used # for the Immediate addressing mode, where the operand followed the instruction. PAL-11使用#作为立即寻址模式,其中操作数遵循指令。 A constant without # meant Relative addressing mode, where a relative address followed the instruction. 没有#的常量表示相对寻址模式,其中相对地址跟随指令。

Unix as used the same syntax for addressing modes as DEC assemblers, with * instead of @ , and $ instead of # , since @ and # were apparently inconvenient to type 2 . Unix使用与DEC汇编程序相同的语法来寻址模式,使用*代替@ ,而使用$代替# ,因为@#显然不便于输入2

Why do all registers have a '%'? 为什么所有寄存器都有'%'?

In PAL-11, registers were defined as R0=%0, R1=%1, ... with R6 also referred to as SP, and R7 also referred to as PC. 在PAL-11中,寄存器定义为R0 =%0,R1 =%1,...,R6也称为SP,R7也称为PC。 The DEC MACRO-11 macro-assembler allowed referring to registers as %x , where x could be an arbitrary expression, eg %3+1 referred to %4 . DEC MACRO-11宏汇编程序允许将寄存器称为%x ,其中x可以是任意表达式,例如%3+1表示%4

Is this just another attempt to get me to do a lot of lame typing? 这只是让我做很多蹩脚打字的另一种尝试吗?

Nope. 不。

Also, am I the only one that finds: 16(%esp) really counterintuitive compared to [esp+16]? 另外,我是唯一一个发现:16(%esp)与[esp + 16]相比真的违反直觉吗?

This comes from the PDP-11 Index addressing mode, where a memory address is formed by summing the contents of a register and an index word following the instruction. 这来自PDP-11 索引寻址模式,其中通过将寄存器的内容与指令之后的索引字相加来形成存储器地址。

I know it compiles to the same thing but why would anyone want to type a lot of '$' and '%'s without a need to? 我知道它编译成同样的东西,但为什么有人想要输入很多'$'和'%'而不需要? - Why did GNU choose this syntax as the default? - 为什么GNU选择此语法作为默认语法?

It came from the PDP-11. 它来自PDP-11。

Another thing, why is every instruction in at&t syntax preceded by an: l? 另一件事,为什么at&t语法中的每条指令前面都有一个:l? - I do know its for the operand sizes, however why not just let the assembler figure that out? - 我知道它的操作数大小,但为什么不让汇编程序弄清楚呢? (would I ever want to do a movl on operands that are not that size?) (我是否想要在不是那么大的操作数上做一个movl?)

gas can usually figure it out. 天然气通常可以搞清楚。 Other assemblers also need help in particular cases. 其他装配工在特定情况下也需要帮助。

The PDP-11 would use b for byte instructions, eg: CLR vs CLRB . PDP-11将b用于字节指令,例如: CLR vs CLRB Other suffixes appeared in VAX-11: l for long, w for word, f for float, d for double, q for quad-word, ... 其他后缀出现在VAX-11中: l表示长, w表示字, f表示浮点数, d表示双字, q表示四字,...

 Last thing: why are the mov arguments inverted? 

Arguably, since the PDP-11 predates Intel microprocessors, it is the other way around. 可以说,由于PDP-11早于英特尔微处理器,它就是另一种方式。


  1. According to gas info-page, through the BSD 4.2 assembler. 根据gas info-page,通过BSD 4.2汇编程序。
  2. Unix Assembler Reference Manual §8.1 - Dennis M. Ritchie Unix Assembler参考手册§8.1 - Dennis M. Ritchie

The reason AT&T syntax inverts operand order compared to Intel is most likely because the PDP-11, on which Unix was originally developed, uses the same order of operands. 与英特尔相比,AT&T语法颠倒操作数顺序的原因很可能是因为最初开发Unix的PDP-11使用相同的操作数顺序。

Intel and DEC simply chose opposite orders. 英特尔和DEC只是选择相反的订单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM