简体   繁体   English

汇编程序是如何工作的?

[英]How does an assembler work?

I am looking for a brief description of the use of an assembler in producing machine code.我正在寻找有关在生成机器代码中使用汇编程序的简要说明。

So I know that assembly is a 1:1 translation of machine code.所以我知道汇编是机器代码的 1:1 翻译。 But I am getting confused about object code and linkers and how they place into it.但我对 object 代码和链接器以及它们如何放入其中感到困惑。

I don't need a complex answer just a simple one will do fine我不需要一个复杂的答案,只要一个简单的就可以了

Both an assembler and a compiler translate source files into object files.汇编器和编译器都将源文件翻译成 object 文件。

Object files are effectively an intermediate step before the final executable output (generated by the linker). Object 文件实际上是最终可执行文件 output(由链接器生成)之前的中间步骤。

The linker takes the specified object files and libraries (which are packages of object files) and resolves relocation (or 'fixup') records. linker 采用指定的 object 文件和库(它们是 object 文件的包)并解析重定位(或“修复”)记录。

These relocation records are made when the compiler/assembler doesn't know the address of a function or variable used in the source code, and generates a reference for it by name, which can be resolved by the linker.这些重定位记录是在编译器/汇编器不知道源代码中使用的function或变量的地址时制作的,并通过名称为其生成引用,可以通过linker解析。

For example, say you want a program to print a message to the screen, seperated into two source files, and you want to assemble them seperately and link them (example using Linux x86-64 syscalls) -例如,假设您想要一个程序将消息打印到屏幕上,分成两个源文件,并且您想要单独组装它们并链接它们(例如使用 Linux x86-64 系统调用) -

main.asm: main.asm:

bits 64
section .text
extern do_message
global _start
_start:
    call do_message
    mov rax, 1
    int 0x80 

message.asm:消息.asm:

bits 64
section .text
global do_message
do_message:
    mov rdi, message
    mov rcx, dword -1
    xor rax, rax
    repnz scasb
    sub rdi, message
    mov rax, 4
    mov rbx, 1
    mov rcx, message
    mov rdx, rdi
    int 0x80
    ret

section .data
message: db "hello world",10,0

If you assemble these and look at the object file output of main.asm (eg, objdump -d main.o), you will notice the 'call do_message' has an address of 00 00 00 00 - which is invalid.如果您组装这些并查看 main.asm 的 object 文件 output(例如 objdump -d main.o),您会注意到“call do_message”的地址为 00 00 00 00 - 这是无效的。

0000000000000000 <_start>:
   0:   e8 00 00 00 00          callq  5 <_start+0x5>
   5:   48 c7 c0 01 00 00 00    mov    $0x1,%rax
   c:   cd 80                   int    $0x80

But, a relocation record is made for the 4 bytes of the address:但是,为地址的 4 个字节制作了重定位记录:

$ objdump -r main.o
main.o:     file format elf64-x86-64
RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE 
0000000000000001 R_X86_64_PC32     do_message+0xfffffffffffffffc
000000000000000d R_X86_64_32       .data

The offset is '1' and the type is 'R_X86_64_PC32' which tells the linker to resolve this reference, and put the resolved address into the specified offset.偏移量为“1”,类型为“R_X86_64_PC32”,它告诉 linker 解析此引用,并将解析的地址放入指定的偏移量。

When you link the final program with 'ld -o program main.o message.o', the relocations are all resolved, and if nothing is unresolved, you are left with an executable.当您将最终程序与“ld -o program main.o message.o”链接时,所有重定位都已解决,如果没有任何问题未解决,您将得到一个可执行文件。

When we 'objdump -d' the executable, we can see the resolved address:当我们 'objdump -d' 可执行文件时,我们可以看到解析的地址:

00000000004000f0 <_start>:
  4000f0:   e8 0b 00 00 00          callq  400100 <do_message>
  4000f5:   48 c7 c0 01 00 00 00    mov    $0x1,%rax
  4000fc:   cd 80                   int    $0x80

The same kind of relocations are used for variables as well as functions.相同类型的重定位用于变量和函数。 The same process happens when you link your program against multiple large libraries, such as libc - you define a function called 'main' which libc has an external reference to - then libc is started before your program, and calls your 'main' function when you run the executable.当您将程序与多个大型库(例如 libc)链接时会发生相同的过程 - 您定义了一个名为“main”的 function,其中 libc 有一个外部引用 - 然后 libc 在您的程序之前启动,并在你运行可执行文件。

Simple explanation:简单解释:

Once the assembly language is assembled into object code, the linker is used to convert the object code into an executable of commands that the computer can understand and run.一旦汇编语言被汇编成 object 代码,linker 用于将 object 代码转换为计算机可以理解和运行的可执行命令。 The generated machine code can be interpreted by the cpu's controller.生成的机器码可以被cpu的controller解释。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM