简体   繁体   English

BareMetalOS如何在没有malloc,brk或mmap的Assembly中分配内存?

[英]How is BareMetalOS allocating memory in Assembly without malloc, brk, or mmap?

As a learning experiment, I am interested in creating a hashtable in assembly (x86-64 in NASM on OSX). 作为一个学习实验,我对在程序集(OSX上的NASM中的x86-64)中创建哈希表感兴趣。 One of the requirements is to be able to dynamically allocate/manage memory. 要求之一是能够动态分配/管理内存。

After looking through many resources on how to allocate memory in assembly, most of them recommend either brk or mmap syscalls. 浏览了许多有关如何在程序集中分配内存的资源之后,大多数人建议使用brkmmap syscall。 I haven't learned exactly how these worked yet because I found another implementation of memory allocation in BareMetal-OS that doesn't use any system calls (copied their code below). 我还没有确切了解它们是如何工作的,因为我在BareMetal-OS中发现了另一种不使用任何系统调用的内存分配实现(在下面复制了它们的代码)。

My question is, how are they doing this? 我的问题是,他们如何做到这一点? Can you explain the relevant instructions in their assembly that perform the memory allocation, for someone without a systems programming background and who is new to assembly? 您能为那些没有系统编程背景并且是汇编新手的用户解释在汇编中执行内存分配的相关指令吗? The reason for wanting to understand how to implement memory allocation in assembly is to be able to implement a hashtable in assembly. 想要了解如何在程序集中实现内存分配的原因是能够在程序集中实现哈希表。

Being new to assembly (I mainly do JavaScript), and having not found any detailed resources yet on memory allocation in assembly, I don't know where to start. 刚接触汇编(我主要使用JavaScript),并且尚未在汇编中找到有关内存分配的详细资源,所以我不知道从哪里开始。 It may be obvious to you, but you have the background, which I don't. 对您来说可能很明显,但是您有背景,而我没有。 I have done some assembly the past week or two, so I understand the basics about mov on registers, and the jump commands, but don't yet understand the additional stuff they are doing to implement this memory stuff. 在过去的一两周中,我已经进行了一些汇编 ,因此我了解了有关寄存器上mov的基本知识以及跳转命令,但还不了解他们为实现此内存工作正在做的其他工作。 My thinking is, if they can implement memory allocation in assembly without brk or mmap , then I want to do it that way because then I really am manipulating the memory directly without any system layers, and it seems like you can really fine-tune stuff. 我的想法是,如果他们可以在不使用brkmmap情况下在程序集中实现内存分配,那么我想这样做,因为那样的话,我真的是直接在没有任何系统层的情况下操纵内存,而且看来您可以对它们进行微调。

Here is their code copied from GitHub: 这是从GitHub复制的代码:

https://github.com/ReturnInfinity/BareMetal-OS/blob/master/os/syscalls/memory.asm https://github.com/ReturnInfinity/BareMetal-OS/blob/master/os/syscalls/memory.asm

# =============================================================================
# BareMetal -- a 64-bit OS written in Assembly for x86-64 systems
# Copyright (C) 2008-2014 Return Infinity -- see LICENSE.TXT
#
# Memory functions
# =============================================================================

align 16
db 'DEBUG: MEMORY   '
align 16


# -----------------------------------------------------------------------------
# os_mem_allocate -- Allocates the requested number of 2 MiB pages
#  IN:  RCX = Number of pages to allocate
# OUT:  RAX = Starting address (Set to 0 on failure)
# This function will only allocate continuous pages
os_mem_allocate:
  push rsi
  push rdx
  push rbx

  cmp rcx, 0
  je os_mem_allocate_fail   # At least 1 page must be allocated

  # Here, we'll load the last existing page of memory in RSI.
  # RAX and RSI instructions are purposefully interleaved.

  xor rax, rax
  mov rsi, os_MemoryMap   # First available memory block
  mov eax, [os_MemAmount]   # Total memory in MiB from a double-word
  mov rdx, rsi      # Keep os_MemoryMap unmodified for later in RDX         
  shr eax, 1      # Divide actual memory by 2

  sub rsi, 1
  std       # Set direction flag to backward
  add rsi, rax      # RSI now points to the last page

os_mem_allocate_start:      # Find a free page of memory, from the end.
  mov rbx, rcx      # RBX is our temporary counter

os_mem_allocate_nextpage:
  lodsb
  cmp rsi, rdx      # We have hit the start of the memory map, no more free pages
  je os_mem_allocate_fail

  cmp al, 1
  jne os_mem_allocate_start # Page is taken, start counting from scratch

  dec rbx       # We found a page! Any page left to find?
  jnz os_mem_allocate_nextpage

os_mem_allocate_mark:     # We have a suitable free series of pages. Allocate them.
  cld       # Set direction flag to forward

  xor rdi, rsi      # We swap rdi and rsi to keep rdi contents.
  xor rsi, rdi
  xor rdi, rsi

  # Instructions are purposefully swapped at some places here to avoid 
  # direct dependencies line after line.
  push rcx      # Keep RCX as is for the 'rep stosb' to come
  add rdi, 1
  mov al, 2
  mov rbx, rdi      # RBX points to the starting page
  rep stosb
  mov rdi, rsi      # Restoring RDI
  sub rbx, rdx      # RBX now contains the memory page number
  pop rcx       # Restore RCX

  # Only dependency left is between the two next lines.
  shl rbx, 21     # Quick multiply by 2097152 (2 MiB) to get the starting memory address
  mov rax, rbx      # Return the starting address in RAX
  jmp os_mem_allocate_end

os_mem_allocate_fail:
  cld       # Set direction flag to forward
  xor rax, rax      # Failure so set RAX to 0 (No pages allocated)

os_mem_allocate_end:
  pop rbx
  pop rdx
  pop rsi
  ret
# -----------------------------------------------------------------------------


# -----------------------------------------------------------------------------
# os_mem_release -- Frees the requested number of 2 MiB pages
#  IN:  RAX = Starting address
# RCX = Number of pages to free
# OUT:  RCX = Number of pages freed
os_mem_release:
  push rdi
  push rcx
  push rax

  shr rax, 21     # Quick divide by 2097152 (2 MiB) to get the starting page number
  add rax, os_MemoryMap
  mov rdi, rax
  mov al, 1
  rep stosb

  pop rax
  pop rcx
  pop rdi
  ret
# -----------------------------------------------------------------------------


# -----------------------------------------------------------------------------
# os_mem_get_free -- Returns the number of 2 MiB pages that are available
#  IN:  Nothing
# OUT:  RCX = Number of free 2 MiB pages
os_mem_get_free:
  push rsi
  push rbx
  push rax

  mov rsi, os_MemoryMap
  xor rcx, rcx
  xor rbx, rbx

os_mem_get_free_next:
  lodsb
  inc rcx
  cmp rcx, 65536
  je os_mem_get_free_end
  cmp al, 1
  jne os_mem_get_free_next
  inc rbx
  jmp os_mem_get_free_next

os_mem_get_free_end:
  mov rcx, rbx

  pop rax
  pop rbx
  pop rsi
  ret
# -----------------------------------------------------------------------------


# -----------------------------------------------------------------------------
# os_mem_copy -- Copy a number of bytes
#  IN:  RSI = Source address
# RDI = Destination address
# RCX = Number of bytes to copy
# OUT:  Nothing, all registers preserved
os_mem_copy:
  push rdi
  push rsi
  push rcx

  rep movsb     # Optimize this!

  pop rcx
  pop rsi
  pop rdi
  ret
# -----------------------------------------------------------------------------


# =============================================================================
# EOF

Also note, I have read many resources on creating hashtables in C, one of which I have copied here (which has the C code, and corresponding assembly). 还要注意,我已经阅读了很多有关在C中创建哈希表的资源,我在这里复制其中的一个(具有C代码和相应的程序集)。 However, pretty much all of the C examples use malloc , which I want to avoid. 但是,几乎所有的C示例都使用malloc ,我想避免这种情况。 I am trying to learn assembly without depending on C at all. 我正在尝试完全不依赖C来学习汇编。

Also, this resource from Quora was helpful in pointing to the places in the malloc.c source code where brk and mmap are used. 同样, 来自Quora的资源有助于指出malloc.c源代码中使用brkmmap位置。 However, I haven't studied that yet because of discovering the BareMetal-OS memory.asm code, which seems to allocate memory without even using those syscalls. 但是,由于发现了BareMetal-OS memory.asm代码,所以我还没有进行研究,该代码似乎甚至不使用那些syscall来分配内存。 Hence the question, how are they doing that? 因此,问题是,他们如何做到这一点? Can you explain the relevant instructions in their assembly that perform the memory allocation? 您能否在它们的程序集中解释执行内存分配的相关指令?

Update 更新资料

This book helps explain pretty much everything about the internals of memory below mmap and brk , it's all in the area of implementing operating systems. 这本书有助于解释mmapbrk以下有关内存内部的几乎所有内容,而所有内容都在实现操作系统方面。 http://www.amazon.com/Modern-Operating-Systems-4th-Edition/dp/013359162X http://www.amazon.com/Modern-Operating-Systems-4th-Edition/dp/013359162X

In order to manage memory, your code needs to "own" some memory. 为了管理内存,您的代码需要“拥有”一些内存。 The problem is that on any machine that has an operating system, the operating system owns all of the memory. 问题在于,在具有操作系统的任何计算机上,操作系统都拥有所有内存。 So your code has to ask the operating system for some memory, which it can do with brk , or mmap , or malloc . 因此,您的代码必须向操作系统询问一些内存,可以使用brkmmapmalloc

So for example, if you want to write a memory manager in assembly, and you have a machine with 4GB of memory, it would not be unreasonable to request 1GB of memory from malloc at the start of the program, and then manage that memory any way you like. 因此,例如,如果您要汇编编写内存管理器,并且有一台具有4GB内存的计算机,那么在程序开始时从malloc请求1GB内存,然后以任何方式管理该内存将是不合理的。你喜欢的方式。

The assembly code from the BareMetal-OS really doesn't apply to your situation, because BareMetal is the operating system, and therefore doesn't need to ask anyone for memory. BareMetal-OS的汇编代码确实不适用于您的情况,因为BareMetal 操作系统,因此不需要请求任何人提供内存。 It already owns all of the memory, and can manage it anyway it likes. 它已经拥有所有内存,并且可以按自己喜欢的方式对其进行管理。

Following on from other comments and answers, the reason BareMetal-OS can implement allocation in this manner is because it is relying on several additional function calls not present in the code posted or in general assembly compilers such as NASM, etc. Specifically, the calls relied on in the posted code are: 根据其他评论和答案,BareMetal-OS可以以这种方式实现分配的原因是因为它依赖于发布的代码或常规汇编编译器(例如NASM等)中不存在的几个其他函数调用。在发布的代码中依赖的是:

os_MemoryMap
os_MemAmount

They are either BareMetal-OS Specific calls or likely calls specific to some memory manager used by the person posting the code. 它们要么是BareMetal-OS特定的调用,要么是特定于代码发布者所使用的某些内存管理器的调用。 Without some external library, (eg libc or a memory manager lib), you are limited to the brk instruction. 如果没有一些外部库(例如libc或内存管理器lib),则只能使用brk指令。 ( 45 on x86 and 12 on x86_64 ) Hopefully this adds another piece to the puzzle. 45 on x86 12 on x86_64 45 on x86 12 on x86_64 45 on x86 12 on x86_64 )希望这会给难题再添一笔。 Good luck. 祝好运。

This post explains the assembly code for the os_mem_allocate function. 这篇文章解释了os_mem_allocate函数的汇编代码。 The basic idea is that memory is allocated in 2MB chunks. 基本思想是以2MB的块分配内存。 There's an array of 65536 bytes ( os_MemoryMap ) that keeps track of which chunks are free and which are used. 有一个65536个字节的数组( os_MemoryMap ),用于跟踪哪些块是空闲的以及哪些块已使用。 A value of 1 is a free chunk, a value of 2 is a used chunk. 值1是空闲块,值2是已用块。 The total amount of memory that could be managed is 64K * 2MB = 128GB. 可以管理的内存总量为64K * 2MB = 128GB。 Since most machines don't have that much memory there's another variable ( os_MemAmount ) that indicates the memory size of the machine (in MB). 由于大多数计算机没有那么多内存,因此还有另一个变量( os_MemAmount )指示计算机的内存大小(以MB为单位)。

The input to the os_mem_allocate function is a count, ie how many 2MB chunks to allocate. os_mem_allocate函数的输入是一个计数,即要分配多少2MB块。 The function is designed to only allocate contiguous chunks. 该函数旨在仅分配连续的块。 For example, if the input request is 3, then the function attempts to allocate 6MB of memory, and does this by searching the array for three 1's in a row. 例如,如果输入请求为3,则该函数尝试分配6MB内存,并通过在数组中连续搜索三个1来进行分配。 The return value from the function is a pointer to the allocated memory, or 0 if the request could not be satisfied. 该函数的返回值是指向分配的内存的指针,如果无法满足请求,则返回0。

The input count is passed in rcx . 输入计数在rcx传递。 The code verifies that the request is for a non-zero number of chunks. 该代码验证请求是否为非零数量的块。 An input of 0 results in a return value of 0. 输入0会导致返回值为0。

os_mem_allocate:
    push rsi                  # save some registers 
    push rdx
    push rbx

    cmp rcx, 0                # Is the count 0?
    je os_mem_allocate_fail   # If YES, then return 0

The code does a roundabout calculation to point rsi to the last usable byte in the 65536 byte array. 该代码执行回旋计算以将rsi指向65536字节数组中的最后一个可用字节。 The last two lines of the following snippet are the most interesting. 以下代码段的最后两行是最有趣的。 Setting the direction flag means that subsequent lodsb instructions will decrement rsi . 设置方向标志意味着后续的lodsb指令将递减rsi And of course pointing rsi to the last usable byte in the array is the whole point of the calculation. 当然,将rsi指向数组中的最后一个可用字节是计算的重点。

    xor rax, rax
    mov rsi, os_MemoryMap   # Get the address of the 65536 byte array into RSI
    mov eax, [os_MemAmount] # Get the memory size in MB into EAX
    mov rdx, rsi            # Keep os_MemoryMap in RDX for later use        
    shr eax, 1              # Divide by 2 because os_MemAmount is in MB, but chunks are 2MB

    sub rsi, 1              # in C syntax, we're calculating &array[amount/2-1], which is the address of the last usable byte in the array
    std                     # Set direction flag to backward
    add rsi, rax            # RSI now points to the last byte

Next the code has a loop that searches for N contiguous free chunks, where N is the count that was passed to the function in rcx . 接下来,代码具有一个循环,该循环搜索N个连续的空闲块,其中N是在rcx传递给函数的rcx The loop scans backwards through the array looking for N 1's in a row. 循环向后扫描阵列,连续查找N 1个。 The loop succeeds if rbx reaches 0. Any time the loop finds a 2 in the array, it resets rbx back to N. 如果rbx达到0,则循环成功。 rbx循环在数组中找到2时,它将rbx重置回N。

os_mem_allocate_start:       
    mov rbx, rcx                 # RBX is the number of contiguous free chunks we need to find

os_mem_allocate_nextpage:
    lodsb                        # read a byte into AL, and decrement RSI
    cmp rsi, rdx                 # if RSI has reached the beginning of the array
    je os_mem_allocate_fail      # then the loop has failed

    cmp al, 1                    # Is the chunk free?
    jne os_mem_allocate_start    # If NO, we need to restart the count

    dec rbx                      # If YES, decrement the count 
    jnz os_mem_allocate_nextpage # If the count reaches zero we've succeeded, otherwise continue looping

At this point the code has found enough contiguous chunks to satisfy the request, so now it marks all of the chunks as "used" by setting the bytes in the array to 2. The direction flag is set to forward so that subsequent stosb instructions will increment rdi . 此时,代码已找到足够的连续块来满足请求,因此现在通过将数组中的字节设置为2,将所有块标记为“已使用”。方向标志设置为转发,以便后续的stosb指令将增量rdi

os_mem_allocate_mark:      # We have a suitable free series of chunks, mark them as used
    cld                    # Set direction flag to forward

    xor rdi, rsi           # We swap RDI and RSI to keep RDI contents, but
    xor rsi, rdi           # more importantly we want RDI to point to the     
    xor rdi, rsi           # location in the array where we want to write 2's

    push rcx               # Save RCX since 'rep stosb' will modify it
    add rdi, 1             # the previous loop decremented RSI too many times
    mov al, 2              # the value 2 indicates a "used" chunk
    mov rbx, rdi           # RBX is going to be used to calculate the return value
    rep stosb              # store some 2's in the array, using the count in RCX
    mov rdi, rsi           # Restoring RDI

Finally, the function needs to come up with a pointer to return to the caller. 最后,该函数需要提供一个指针以返回到调用方。

    sub rbx, rdx           # RBX is now an index into the 65536 byte array
    pop rcx                # Restore RCX
    shl rbx, 21            # Multiply by 2MB to convert the index to a pointer
    mov rax, rbx           # Return the pointer in RAX
    jmp os_mem_allocate_end

The next snippet handles errors by setting the return value to 0. Clearing the direction flag is important since by convention the direction is forward. 下一个代码段通过将返回值设置为0处理错误。清除方向标志很重要,因为按照惯例,方向是向前的。

os_mem_allocate_fail:
    cld               # Set direction flag to forward
    xor rax, rax      # Failure so set RAX to 0 (No pages allocated)

Finally, restore the registers and return the pointer. 最后,还原寄存器并返回指针。

os_mem_allocate_end:
   pop rbx
   pop rdx
   pop rsi
   ret

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM