简体   繁体   中英

Convert integer to string in assembly (64-bit, NASM) as fast as possible

I wrote a printint function in 64-bit NASM that prints an integer to STDOUT. It's really slow though, and after doing some benchmarks I determined that converting an integer to a string is the slowest part by far.

My current strategy for converting ints to strings goes like this:

  1. If the number is 0, skip to a special case.
  2. If the number is negative, print a negative sign, turn it into a positive integer, and continue
  3. Create a 10-byte buffer (enough to hold all 32-bit integers) and a pointer pointing to the back of the buffer
  4. Check if the number is 0; if it is, we're done.
  5. Divide the number by 10, convert the remainder into ASCII
  6. Put the remainder into the buffer (from back to front)
  7. Decrement the buffer pointer
  8. Loop back to step 4

I've tried Googling for how other people do it and it's more or less similar to what I do, dividing by 10 until the number is 0.

Here's the relevant code:

printint:                       ; num in edi
    push rbp                    ; save base pointer
    mov rbp, rsp                ; place base pointer on stack
    sub rsp, 20                 ; align stack to keep 20 bytes for buffering
    cmp edi, 0                  ; compare num to 0
    je _printint_zero           ; 0 is special case
    cmp edi, 0
    jg _printint_pos            ; don't print negative sign if positive

    ; print a negative sign (code not relevant)
    xor edi, -1                 ; convert into positive integer
    add edi, 1
_printint_pos:
    mov rbx, rsp                ; set rbx to point to the end of the buffer
    add rbx, 17
    mov qword [rsp+8], 0        ; clear the buffer
    mov word [rsp+16], 0        ; 10 bytes from [8,18)
_printint_loop:
    cmp edi, 0                  ; compare edi to 0
    je _printint_done           ; if edi == 0 then we are done
    xor edx, edx                ; prepare eax and edx for division
    mov eax, edi
    mov ecx, 10
    div ecx                     ; divide and remainder by 10
    mov edi, eax                ; move quotient back to edi
    add dl, 48                  ; convert remainder to ascii
    mov byte [rbx], dl          ; move remainder to buffer
    dec rbx                     ; shift 1 position to the left in buffer
    jmp _printint_loop
_printint_done:
    ; print the buffer (code not relevant)
    mov rsp, rbp                ; restore stack and base pointers
    pop rbp
    ret

How can I optimize it so that it can run much faster? Alternatively, is there a significantly better method to convert an integer to a string?

I do not want to use printf or any other function in the C standard library

Turns out I was wrong about the source of the bottleneck. My benchmark was flawed. Although micro-optimizations such as magic number multiplication and better loops did help, the biggest bottleneck was the syscalls.

By using buffered reading & writing (buffer size of 16 kB), I was able to achieve my goal of reading and printing integers faster than scanf and printf.

Creating an output buffer sped up one particular benchmark by over 4x, whereas the micro-optimizations sped it up by about 25%.

For anyone who stumbles across this post in the future, here are the optimizations I made:

  1. Replaced all sys_write calls with a write_buf call instead, where write_buf writes the output to a buffer and only prints the buffer if it becomes full. The implementation of such a write_buf function is left as an exercise to the reader.
  2. Replaced the division with magic number multiplication, which saves a few clock cycles per loop iteration.
  3. Changed while loop into do...while loops instead, saving one jmp instruction per loop iteration (adds up quite a bit over time).
  4. Optimizing individual instructions (eg using neg instead of xor, add) and removing redundant instructions.

Another potential improvement I could make (but didn't) is dividing by a larger base and using a lookup table, as mentioned by phuclv in the comments.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM