简体   繁体   English

引用分别加载到内存的另一部分的代码/数据的符号

[英]Referencing symbols of code/data loaded separately to another part of memory

I have two nasm-syntax assembly files, let's say a.asm and b.asm . 我有两个nasm语法汇编文件,比方说a.asmb.asm
They will need to be assembled into two seperate binary files, a.bin and b.bin . 它们需要组装成两个单独的二进制文件, a.binb.bin
On startup a.bin will be loaded by another program to a fixed location in memory ( 0x1000 ). 在启动时, a.bin将被另一个程序加载到内存中的固定位置( 0x1000 )。
b.bin will be loaded later to an arbitrary location in memory. b.bin将稍后加载到内存中的任意位置。
b.bin will use some of the functions defined in a.bin . b.bin将使用a.bin定义的一些函数。
PROBLEM: b.bin does not know where the functions are located in a.bin 问题: b.bin不知道函数在a.bin

Why do they need to be seperate? 为什么他们需要分开? They're unrelated, keeping b.bin (and many more files) and a.bin in one file would defeat the purpose of a file system. 它们是无关的,将b.bin (以及更多文件)和a.bin在一个文件中会a.bin文件系统的目的。

Why not %include it? 为什么不%include Memory usage, a.bin is quite a large set of functions taking up lots of memory, and because of the 640kb memory limit in x86 real mode i can't really afford to have this in memory for every file that needs it. 内存使用, a.bin是占用大量内存的相当大的一组函数,并且由于x86实模式中的640kb内存限制,我无法真正承担内存中每个需要它的文件。

possible solution 1: just hardcode the locations. 可能的解决方案1:只需硬编码位置。
problem: what if i change something minor at the very start of a.bin ? 问题:如果我在a.bin开始就改变了一些小问题怎么a.bin I'll need to update all pointers to stuff after it, and that's not handy. 我需要在它之后更新所有指针,这并不方便。

possible solution 2: keep track of the function locations in one file, and %include that. 可能的解决方案2:跟踪一个文件中的功能位置, %include该功能。
This is probably what i'll do if i have no other options. 如果我没有其他选择,这可能就是我要做的。 I might even be able to automatically generate this file if nasm can generate easy-to-parse symbol listings, otherwise it's still too much work. 如果nasm可以生成易于解析的符号列表,我甚至可以自动生成此文件,否则它仍然太多工作。

possible solution 3: keep a table in memory of where the functions are located, instead of the functions themselves. 可能的解决方案3:在函数所在的内存中保存一个表,而不是函数本身。 This also has the added benefit of backwards compatibility, if i do decide to change a.bin , all things using it don't have to change along with it. 这也具有向后兼容性的额外好处,如果我决定更改a.bin ,使用它的所有东西都不必随之改变。
problem: indirect call is really slow and takes up lot's of disk space, though really this is a minor issue. 问题:间接调用真的很慢并占用了大量的磁盘空间,尽管这确实是一个小问题。 The table will also take up some space in disk and memory though. 该表还将占用磁盘和内存中的一些空间。
My idea was to add this later, as a library or something like that. 我的想法是稍后添加它,作为一个库或类似的东西。 So everything that's compiled along with a.bin can call it faster by using direct calls and things that are compiled seperately as eg. 因此,与a.bin一起编译的所有内容都可以通过使用直接调用和单独编译的内容来更快地调用它。 applications can use the table for slower but safer access to a.bin . 应用程序可以使用该表来更慢但更安全地访问a.bin

TLDR; TLDR;
how to include labels from another asm file so that they can be called w/o including the actual code in the final assembled file? 如何包含来自另一个asm文件的标签,以便可以调用它们,包括最终汇编文件中的实际代码?

You could proceed like this: 你可以像这样继续:

  1. Assemble and link a.bin to be loaded from address 0x1000 . 从地址0x1000汇编并链接要加载的a.bin
  2. Use the nm utility (or similar) to dump the symbol table of a.bin 使用nm实用程序(或类似工具)转储a.bin的符号表
  3. Write a script to turn the symbol table into an assembly file asyms.asm that contains for each symbol in a.bin a line of the form 写一个脚本打开符号表成组件文件asyms.asm包含每个符号在a.bin的线的形式的

     sym EQU addr 

    where addr is the actual address of sym as given by nm 其中addrnm给出的sym的实际地址

  4. Include or link asyms.asm when compiling b.bin . 包含或链接asyms.asm编译时b.bin This makes the addresses of the symbols in a.bin visible to your assembler code without pulling in the corresponding code. 这使得a.bin符号的地址对汇编代码可见,而不会a.bin相应的代码。

What you are trying to do is known as building an overlay. 您要做的事情被称为构建叠加层。 I believe some assemblers and linkers do have support for this sort of thing but I am not sure about the details. 我相信一些汇编程序和链接器确实支持这类事情,但我不确定细节。

You have a number of possibilities. 你有很多可能性。 This answer focuses on a hybrid of 1 and 2. Although you can create a table of function pointers, we can use direct calls to the routines in a common library by symbol name without copying the common library routines into each program. 这个答案主要关注1和2的混合。虽然您可以创建函数指针表,但我们可以使用符号名称直接调用公共库中的例程,而无需将公共库例程复制到每个程序中。 The method I use would be to utilize the power of LD and linker scripts to create a shared library that will have a static location in memory that is accessed via FAR CALLs (segment and offset form function address) from independent programs(s) loaded elsewhere in RAM. 我使用的方法是利用LD和链接器脚本的强大功能来创建一个共享库,该库在内存中具有静态位置,可通过FAR CALL(段和偏移形式函数地址)从其他地方加载的独立程序访问在RAM中。

Most people when they start out create a linker script that produces a copy of all the input sections in the output. 大多数人在他们开始时创建一个链接器脚本,该脚本生成输出中所有输入节的副本。 It is possible to create output sections that never appear (not LOADed) in the output file but the linker can still use the symbols of those nonloaded sections to resolve symbol addresses. 可以在输出文件中创建从不出现(未加载)的输出节,但链接器仍然可以使用这些非加载节的符号来解析符号地址。

I've created a simple common library with a print_banner and print_string function that use BIOS functions to print to the console. 我创建了一个简单的公共库,其中包含print_bannerprint_string函数,它使用BIOS函数打印到控制台。 Both are assumed to be called via FAR CALL's from other segments. 假设两者都是通过FAR CALL从其他部分调用的。 You may have your common library loaded at 0x0100:0x0000 (physical address 0x01000) but called from code in other segments like 0x2000:0x0000 (physical address 0x20000). 您可以将公共库加载到0x0100:0x0000(物理地址0x01000),但可以从其他段中的代码调用,如0x2000:0x0000(物理地址0x20000)。 A sample commlib.asm file could look like: 示例commlib.asm文件可能如下所示:

bits 16

extern __COMMONSEG
global print_string
global print_banner
global _startcomm

section .text

; Function: print_string
;           Display a string to the console on specified display page
; Type:     FAR
;
; Inputs:   ES:SI = Offset of address to print
;           BL = Display page
; Clobbers: AX, SI
; Return:   Nothing

print_string:               ; Routine: output string in SI to screen
    mov ah, 0x0e            ; BIOS tty Print
    jmp .getch
.repeat:
    int 0x10                ; print character
.getch:
    mov al, [es:si]         ; Get character from string
    inc si                  ; Advance pointer to next character
    test al,al              ; Have we reached end of string?
    jnz .repeat             ;     if not process next character
.end:
    retf                    ; Important: Far return

; Function: print_banner
;           Display a banner to the console to specified display page
; Type:     FAR
; Inputs:   BL = Display page
; Clobbers: AX, SI
; Return:   Nothing

print_banner:
    push es                 ; Save ES
    push cs
    pop es                  ; ES = CS
    mov si, bannermsg       ; SI = STring to print
                            ; Far call to print_string
    call __COMMONSEG:print_string
    pop es                  ; Restore ES
    retf                    ; Important: Far return

_startcomm:                 ; Keep linker quiet by defining this

section .data
bannermsg: db "Welcome to this Library!", 13, 10, 0

We need a linker script that allows us to create a file that we can eventually load into memory. 我们需要一个链接器脚本,允许我们创建一个最终可以加载到内存中的文件。 This code assumes the segment the library will be loaded at is 0x0100 and offset 0x0000 (physical address 0x01000): 此代码假定将加载库的段为0x0100且偏移量为0x0000(物理地址0x01000):

commlib.ld commlib.ld

OUTPUT_FORMAT("elf32-i386");
ENTRY(_startcomm);

/* Common Library at 0x0100:0x0000 = physical address 0x1000 */
__COMMONSEG    = 0x0100;
__COMMONOFFSET = 0x0000;

SECTIONS
{
    . = __COMMONOFFSET;

    /* Code and data for common library at VMA = __COMMONOFFSET */
    .commlib  : SUBALIGN(4) {
        *(.text)
        *(.rodata*)
        *(.data)
        *(.bss)
    }

    /* Remove unnecessary sections */
    /DISCARD/ : {
        *(.eh_frame);
        *(.comment);
    }
}

It is pretty simple, it effectively links a file commlib.o so that it can eventually be loaded at 0x0100:0x0000. 它非常简单,它有效地链接文件commlib.o以便最终可以在0x0100:0x0000加载。 As sample program that uses this library could look like: 使用此库的示例程序可能如下所示:

prog.asm : prog.asm

extern __COMMONSEG
extern print_banner
extern print_string
global _start

bits 16

section .text
_start:
    mov ax, cs                   ; DS=ES=CS
    mov ds, ax
    mov es, ax
    mov ss, ax                   ; SS:SP=CS:0x0000
    xor sp, sp

    xor bx, bx                   ; BL =  page 0 to display on
    call __COMMONSEG:print_banner; FAR Call
    mov si, mymsg                ; String to display ES:SI
    call __COMMONSEG:print_string; FAR Call

    cli
.endloop:
    hlt
    jmp .endloop

section .data
mymsg: db "Printing my own text!", 13, 10, 0

The trick now is to make a linker script that can take a program like this and reference the symbols in our common library without actually adding the common library code again. 现在的诀窍是创建一个链接器脚本,它可以接受这样的程序并引用我们公共库中的符号,而无需再次实际添加公共库代码。 This can be achieved by using the NOLOAD type on an output section in a linker script. 这可以通过在链接描述文件的输出节上使用NOLOAD类型来实现。

prog.ld : prog.ld

OUTPUT_FORMAT("elf32-i386");
ENTRY(_start);

__PROGOFFSET   = 0x0000;

/* Load the commlib.elf file to access all its symbols */
INPUT(commlib.elf)

SECTIONS
{
    /* NOLOAD type prevents the actual code from being loaded into memory
       which means if you create a BINARY file from this, this section will
       not appear */
    . = __COMMONOFFSET;
    .commlib (NOLOAD) : {
        commlib.elf(.commlib);
    }

    /* Code and data for program at VMA = __PROGOFFSET */
    . = __PROGOFFSET;
    .prog : SUBALIGN(4) {
        *(.text)
        *(.rodata*)
        *(.data)
        *(.bss)
    }

    /* Remove unnecessary sections */
    /DISCARD/ : {
        *(.eh_frame);
        *(.comment);
    }
}

The common library's ELF file is loaded by the linker and the .commlib section is marked with a (NOLOAD) type. 公共库的ELF文件由链接器加载,而.commlib部分标记为(NOLOAD)类型。 This will prevent a final program from including the common library functions and data, but allows us to still reference the symbol addresses. 这将阻止最终程序包含公共库函数和数据,但允许我们仍然引用符号地址。

A simple test harness can be created as a bootloader. 可以将简单的测试工具创建为引导加载程序。 The bootloader will load the common library to 0x0100:0x0000 (physical address 0x01000), and the program that uses them is loaded to 0x2000:0x0000 (physical address 0x20000). 引导加载程序将公共库加载到0x0100:0x0000(物理地址0x01000),使用它们的程序加载到0x2000:0x0000(物理地址0x20000)。 The program address is arbitrary, I just picked it because it is in free memory below 1MB. 程序地址是任意的,我只是选择它,因为它在1MB以下的空闲内存中。

boot.asm : boot.asm

org 0x7c00
bits 16

start:
    ; DL = boot drive number from BIOS

    ; Set up stack and segment registers
    xor ax, ax               ; DS = 0x0000
    mov ds, ax
    mov ss, ax               ; SS:SP=0x0000:0x7c00 below bootloader
    mov sp, 0x7c00
    cld                      ; Set direction flag forward for String instructions

    ; Reset drive
    xor ax, ax
    int 0x13

    ; Read 2nd sector (commlib.bin) to 0x0100:0x0000 = phys addr 0x01000
    mov ah, 0x02             ; Drive READ subfunction
    mov al, 0x01             ; Read one sector
    mov bx, 0x0100
    mov es, bx               ; ES=0x0100
    xor bx, bx               ; ES:BS = 0x0100:0x0000 = phys adress 0x01000
    mov cx, 0x0002           ; CH = Cylinder = 0, CL = Sector # = 2
    xor dh, dh               ; DH = Head = 0
    int 0x13

    ; Read 3rd sector (prog.bin) to 0x2000:0x0000 = phys addr 0x20000
    mov ah, 0x02             ; Drive READ subfunction
    mov al, 0x01             ; Read one sector
    mov bx, 0x2000
    mov es, bx               ; ES=0x2000
    xor bx, bx               ; ES:BS = 0x2000:0x0000 = phys adress 0x20000
    mov cx, 0x0003           ; CH = Cylinder = 0, CL = Sector # = 2
    xor dh, dh               ; DH = Head = 0
    int 0x13

    ; Jump to the entry point of our program
    jmp 0x2000:0x0000

    times 510-($-$$) db 0
    dw 0xaa55

After the bootloader loads the common library (sector 1) and program (sector 2) into memory it jumps to the entry point of the program at 0x2000:0x0000. 引导加载程序将公共库(扇区1)和程序(扇区2)加载到内存后,它会跳转到程序的入口点0x2000:0x0000。


Putting it All Together 全部放在一起

We can create the file commlib.bin with: 我们可以使用以下命令创建文件commlib.bin

nasm -f elf32 commlib.asm -o commlib.o
ld -melf_i386 -nostdlib -nostartfiles -T commlib.ld -o commlib.elf commlib.o
objcopy -O binary commlib.elf commlib.bin

commlib.elf is also created as an intermediate file. commlib.elf也被创建为中间文件。 You can create prog.bin with: 您可以使用以下命令创建prog.bin

nasm -f elf32 prog.asm -o prog.o
ld -melf_i386 -nostdlib -nostartfiles -T prog.ld -o prog.elf prog.o
objcopy -O binary prog.elf prog.bin

Create the bootloader ( boot.bin ) with: 使用以下命令创建引导加载程序( boot.bin ):

nasm -f bin boot.asm -o boot.bin

We can build a disk image ( disk.img ) that looks like a 1.44MB floppy with: 我们可以构建一个看起来像1.44MB软盘的磁盘映像( disk.img ):

dd if=/dev/zero of=disk.img bs=1024 count=1440
dd if=boot.bin of=disk.img bs=512 seek=0 conv=notrunc
dd if=commlib.bin of=disk.img bs=512 seek=1 conv=notrunc
dd if=prog.bin of=disk.img bs=512 seek=2 conv=notrunc

This simple example can fit the common library and program in single sectors. 这个简单的例子可以适用于单个扇区中的公共库和程序。 I have also hard coded their locations on the disk. 我还在磁盘上硬编码了它们的位置。 This is just a proof of concept, and not meant to represent your final code. 这只是一个概念证明,并不代表您的最终代码。

When I run this in QEMU (BOCHS will also work) using qemu-system-i386 -fda disk.img I get this output: 当我在QEMU中运行它(BOCHS也可以)使用qemu-system-i386 -fda disk.img我得到这个输出:

在此输入图像描述


Looking at prog.bin 看着prog.bin

In the example above we created a prog.bin file that wasn't suppose to have the common library code in it, but had symbols to it resolved. 在上面的示例中,我们创建了一个prog.bin文件,该文件不应该包含公共库代码,但是已经解析了它的符号。 Is that what happened? 那是怎么回事? If you use NDISASM you can disassemble the binary file as 16-bit code with an origin point of 0x0000 to see what was generated. 如果使用NDISASM,则可以将二进制文件反编译为原点为0x0000的16位代码,以查看生成的内容。 Using ndisasm -o 0x0000 -b16 prog.bin you should see something like: 使用ndisasm -o 0x0000 -b16 prog.bin你应该看到类似的东西:

 ; Text Section 00000000 8CC8 mov ax,cs 00000002 8ED8 mov ds,ax 00000004 8EC0 mov es,ax 00000006 8ED0 mov ss,ax 00000008 31E4 xor sp,sp 0000000A 31DB xor bx,bx ; Both the calls are to the function in the common library that are loaded ; in a different segment at 0x0100. The linker was able to resolve these ; locations for us. 0000000C 9A14000001 call word 0x100:0x11 ; FAR Call print_banner 00000011 BE2000 mov si,0x20 00000014 9A00000001 call word 0x100:0x0 ; FAR Call print_string 00000019 FA cli 0000001A F4 hlt 0000001B EBFD jmp short 0x1a ; Infinite loop 0000001D 6690 xchg eax,eax 0000001F 90 nop ; Data section ; String 'Printing my own text!', 13, 10, 0 00000020 50 push ax 00000021 7269 jc 0x8c 00000023 6E outsb 00000024 7469 jz 0x8f 00000026 6E outsb 00000027 67206D79 and [ebp+0x79],ch 0000002B 206F77 and [bx+0x77],ch 0000002E 6E outsb 0000002F 207465 and [si+0x65],dh 00000032 7874 js 0xa8 00000034 210D and [di],cx 00000036 0A00 or al,[bx+si] 

I have annotated it with a few comments. 我已经注释了一些评论。


Notes 笔记

  • Is it required to use FAR Calls? 是否需要使用FAR电话? No, but if you don't then all of your code will have to fit in a single segment and the offsets won't be able to overlap. 不,但如果不这样做,则所有代码都必须适合单个段,并且偏移量将无法重叠。 Using FAR Calls comes with some overhead but they are more flexible allowing you to better utilize memory below 1MB. 使用FAR Calls会带来一些开销,但它们更灵活,允许您更好地利用1MB以下的内存。 Functions called via a FAR Call have to use FAR Returns ( retf ). 通过FAR调用调用的函数必须使用FAR返回( retf )。 Far functions that use pointers passed from other segments generally need to handle segment and offset of pointers (FAR pointers), not just the offset. 使用从其他段传递的指针的远程函数通常需要处理指针的段偏移量(FAR指针),而不仅仅是偏移量。
  • Using the method in this answer: anytime you make a change to the common library you have to re-link all the programs that rely on it, as the absolute memory addresses for exported (public) functions and data may shift. 使用本答案中的方法:无论何时对公共库进行更改,都必须重新链接所有依赖它的程序,因为导出(公共)函数和数据的绝对内存地址可能会发生变化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM