使用NASM以最少的代码打印换行符

Question

I'm learning a bit of assembly for fun and I am probably too green to know the right terminology and find the answer myself. 我正在学习一些有趣的汇编程序，并且我可能太绿了，无法知道正确的术语并自己找到答案。

I want to print a newline at the end of my program. 我想在程序末尾打印换行符。

Below works fine. 下面工作正常。

section .data
    newline db 10

section  .text
_end:
    mov rax, 1
    mov rdi, 1
    mov rsi, newline
    mov rdx, 1
    syscall

    mov rax, 60
    mov rdi, 0
    syscall

But I'm hoping to achieve the same result without defining the newline in .data. 但是我希望在不定义.data换行符的情况下实现相同的结果。 Is it possible to call sys_write directly with the byte you want, or must it always be done with a reference to some predefined data (which I assume is what mov rsi, newline is doing)? 是否可以直接使用所需的字节调用sys_write ，还是必须始终通过引用一些预定义的数据来完成（我假设这是mov rsi, newline在做什么）？

In short, why can't I replace mov rsi, newline by mov rsi, 10 ? 简而言之，为什么我不能用mov rsi, 10替换mov rsi, newline ？

Answer 1

You always need the data in memory to copy it to a file-descriptor. 您始终需要将内存中的数据复制到文件描述符中。 There is no system-call equivalent of C stdio fputc that takes data by value instead of by pointer. 没有与C stdio fputc等效的系统调用，它按值而不是指针获取数据。

mov rsi, newline puts a pointer into a register (with a huge mov r64, imm64 instruction). mov rsi, newline将指针放入寄存器（带有巨大的mov r64, imm64指令）。 sys_write doesn't special-case size=1 and treat its void *buf arg as a char value if it's not a valid pointer. sys_write大小= 1并没有特殊情况，如果不是有效的指针，则将其void *buf arg视为char 值。

There aren't any other system calls that would do the trick. 没有其他系统调用可以达到目的。 pwrite and writev are both more complicated (taking a file offset as well as a pointer, or taking an array of pointer+length to gather the data in kernel space). pwrite和writev都更加复杂（采用文件偏移量和指针，或者采用指针+长度的数组来收集内核空间中的数据）。

There is a lot you can do to optimize this for code-size, though. 但是，您可以做很多事情来优化代码大小。 See https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code 参见https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code

First, putting the newline character in static storage means you need to generate a static address in a register. 首先，将换行符放在静态存储中意味着您需要在寄存器中生成一个静态地址。 Your options here are: 您的选择如下：

5-bytes mov esi, imm32 (only in Linux non-PIE executables, so static addresses are link-time constants and are known to be in the low 2GiB of virtual address space and thus work as 32-bit zero-extended or sign-extended) 5字节mov esi, imm32 （仅在Linux非PIE可执行文件中，因此静态地址是链接时间常数，并且已知位于虚拟地址空间的低2GiB中，因此可以用作32位零扩展或符号扩展）扩展）
7-byte lea rsi, [rel newline] Works everywhere, the only good option if you can't use the 5-byte mov-immediate. 7字节lea rsi, [rel newline]随处可用，如果您不能使用5字节mov-immediate，这是唯一的好选择。
10-byte mov rsi, imm64 . 10字节mov rsi, imm64 。 This works even in PIE executables (eg if you link with gcc -nostdlib without -static , on a distro where PIE is the default.) But only via a runtime relocation fixup, and the code-size is terrible. 即使在PIE可执行文件中也可以使用（例如，如果在不带-static的发行版中与gcc -nostdlib链接，则默认为PIE。）但是只能通过运行时重定位修正，并且代码大小很糟糕。 Compilers never use this because it's not faster than LEA. 编译器从不使用它，因为它不比LEA快。

But like I said, we can avoid static addressing entirely: Use push to put immediate data on the stack . 但是就像我说的那样， 我们可以完全避免静态寻址：使用push将立即数据放入堆栈中 。 This works even if we need zero-terminated strings, because push imm8 and push imm32 both sign-extend the immediate to 64-bit. 即使我们需要零终止的字符串，这也可以工作，因为push imm8和push imm32都将立即数符号扩展为64位。 Since ASCII uses the low half of the 0..255 range, this is equivalent to zero-extension. 由于ASCII使用0..255范围的下半部分，因此这等效于零扩展。

Then we just need to copy RSP to RSI, because push leave RSP pointing to the data that was pushed. 然后，我们只需要将RSP复制到RSI，因为push使RSP指向被推入的数据。 mov rsi, rsp would be 3 bytes because it needs a REX prefix. mov rsi, rsp为3个字节，因为它需要一个REX前缀。 If you were targeting 32-bit code or the x32 ABI (32-bit pointers in long mode) you could use 2-byte mov esi, esp . 如果您瞄准的是32位代码或x32 ABI（长模式下的32位指针），则可以使用2字节mov esi, esp 。 But Linux puts the stack pointer at top of user virtual address space, so on x86-64 that's 0x007ff..., right at the top of the low canonical range. 但是Linux将堆栈指针放在用户虚拟地址空间的顶部，因此在x86-64上为0x007ff ...，就在低规范范围的顶部。 So truncating a pointer to stack memory to 32 bits isn't an option; 因此，将堆栈存储器的指针截断为32位不是一个选择。 we'd get -EFAULT . 我们会得到-EFAULT 。

But we can copy a 64-bit register with 1-byte push + 1-byte pop . 但是我们可以复制带有1字节push + 1字节pop的64位寄存器。 (Assuming neither register needs a REX prefix to access.) （假设两个寄存器都不需要REX前缀来访问。）

default rel     ; We don't use any explicit addressing modes, but no reason to leave this out.

_start:
    push   10         ; \n

    push   rsp
    pop    rsi        ; 2 bytes total vs. 3 for mov rsi,rsp

    push   1          ; _NR_write call number
    pop    rax        ; 3 bytes, vs. 5 for mov edi, 1

    mov    edx, eax   ; length = call number by coincidence
    mov    edi, eax   ; fd = length = call number  also coincidence
    syscall           ;   write(1, "\n", 1)

    mov    al, 60     ; assuming write didn't return -errno, replace the low byte and keep the high zeros
    ;xor    edi, edi    ; leave rdi = 1  from write
    syscall           ; _exit(1)

.size: db $ - _start

xor-zeroing is the most well-known x86 peephole optimization: it saves 3 bytes of code size, and is actually more efficient than mov edi, 0 . XOR归零是最公知x86窥视孔优化：它可以节省3个字节的代码尺寸，并且实际上比更有效的mov edi, 0 。 But you only asked for the smallest code to print a newline, without specifying that it had to exit with status = 0. So we can save 2 bytes by leaving that out. 但是您只要求最小的代码来打印换行符，而无需指定必须以status = 0退出。所以我们可以省去2个字节。

Since we're just making an _exit system call, we don't need to clean up the stack from the 10 we pushed. 因为我们只是在进行_exit系统调用，所以我们不需要清理我们推送的10堆栈。

BTW, this will crash if the write returns an error. 顺便说一句，如果write返回错误，这将崩溃。 (eg redirected to /dev/full , or closed with ./newline >&- , or whatever other condition.) That would leave RAX=-something, so mov al, 60 would give us RAX= 0xffff...3c . （例如，重定向到/dev/full ，或使用./newline >&-或其他任何条件关闭。）这将使RAX = -something，因此mov al, 60将mov al, 60赋予我们RAX = 0xffff...3c 。 Then we'd get -ENOSYS from the invalid call number, and fall off the end of _start and decode whatever is next as instructions. 然后，我们将从无效的电话号码中获得-ENOSYS ，然后掉到_start的结尾并解码接下来的指令。 (Probably zero bytes which decode with [rax] as an addressing mode. Then we'd fault with a SIGSEGV.) （可能是零字节，以[rax]作为寻址模式进行解码。然后我们将使用SIGSEGV出错。）

objdump -d -Mintel disassembly of that code, after building with nasm -felf64 and linking with ld 在使用nasm -felf64并与ld链接之后， objdump -d -Mintel对该代码进行反汇编

0000000000401000 <_start>:
  401000:       6a 0a                   push   0xa
  401002:       54                      push   rsp
  401003:       5e                      pop    rsi
  401004:       6a 01                   push   0x1
  401006:       58                      pop    rax
  401007:       89 c2                   mov    edx,eax
  401009:       89 c7                   mov    edi,eax
  40100b:       0f 05                   syscall 
  40100d:       b0 3c                   mov    al,0x3c
  40100f:       0f 05                   syscall 

0000000000401011 <_start.size>:
  401011:       11                      .byte 0x11

So the total code-size is 0x11 = 17 bytes. 因此，总代码大小为0x11 = 17个字节。 vs. your version with 39 bytes of code + 1 byte of static data . 与您的版本（带有39个字节的代码+ 1个字节的静态数据）相比 。 Your first 3 mov instructions alone are 5, 5, and 10 bytes long. 仅您的前3个mov指令长5、5和10个字节。 (Or 7 bytes long for mov rax,1 if you use YASM which doesn't optimize it to mov eax,1 ). （或者mov rax,1如果使用YASM不能将其优化为mov eax,1 mov rax,1那么mov rax,1长度为7个字节）。

Running it: 运行它：

$ strace ./newline 
execve("./newline", ["./newline"], 0x7ffd4e98d3f0 /* 54 vars */) = 0
write(1, "\n", 1
)                       = 1
exit(1)                                 = ?
+++ exited with 1 +++

If this was part of a larger program: 如果这是较大程序的一部分：

If you already have a pointer to some nearby static data in a register, you could do something like a 4-byte lea rsi, [rdx + newline-foo] (REX.W + opcode + modrm + disp8), assuming the newline-foo offset fits in a sign-extended disp8 and that RDX holds the address of foo . 如果您已经有一个指向寄存器中附近静态数据的指针，则可以执行4字节lea rsi, [rdx + newline-foo] （REX.W + opcode + modrm + disp8）之类的操作，假设newline-foo offset适合于符号扩展的disp8，并且RDX保留foo的地址。

Then you can have newline: db 10 in static storage after all. 然后，您可以newline: db 10毕竟， newline: db 10在静态存储中。 (Put it .rodata or .data , depending on which section you already had a pointer to). （将其放置为.rodata或.data ，具体取决于您已指向哪个部分）。

Answer 2

It expects an address of the string in rsi register. 它期望该字符串在rsi寄存器中的地址。 Not a character or string. 不是字符或字符串。

mov rsi, newline loads the address of newline into rsi . mov rsi, newline的加载地址newline到rsi 。

使用NASM以最少的代码打印换行符

问题描述

2 个解决方案

解决方案1
5 已采纳 2019-07-12 07:50:08

If this was part of a larger program: 如果这是较大程序的一部分：

解决方案2
2 2019-07-12 07:40:02

使用NASM以最少的代码打印换行符

问题描述

2 个解决方案

解决方案1 5 已采纳 2019-07-12 07:50:08

If this was part of a larger program: 如果这是较大程序的一部分：

解决方案2 2 2019-07-12 07:40:02

解决方案1
5 已采纳 2019-07-12 07:50:08

解决方案2
2 2019-07-12 07:40:02