Referencing symbols of code/data loaded separately to another part of memory

Question

I have two nasm-syntax assembly files, let's say a.asm and b.asm .
They will need to be assembled into two seperate binary files, a.bin and b.bin .
On startup a.bin will be loaded by another program to a fixed location in memory ( 0x1000 ).
b.bin will be loaded later to an arbitrary location in memory.
b.bin will use some of the functions defined in a.bin .
PROBLEM: b.bin does not know where the functions are located in a.bin

Why do they need to be seperate? They're unrelated, keeping b.bin (and many more files) and a.bin in one file would defeat the purpose of a file system.

Why not %include it? Memory usage, a.bin is quite a large set of functions taking up lots of memory, and because of the 640kb memory limit in x86 real mode i can't really afford to have this in memory for every file that needs it.

possible solution 1: just hardcode the locations.
problem: what if i change something minor at the very start of a.bin ? I'll need to update all pointers to stuff after it, and that's not handy.

possible solution 2: keep track of the function locations in one file, and %include that.
This is probably what i'll do if i have no other options. I might even be able to automatically generate this file if nasm can generate easy-to-parse symbol listings, otherwise it's still too much work.

possible solution 3: keep a table in memory of where the functions are located, instead of the functions themselves. This also has the added benefit of backwards compatibility, if i do decide to change a.bin , all things using it don't have to change along with it.
problem: indirect call is really slow and takes up lot's of disk space, though really this is a minor issue. The table will also take up some space in disk and memory though.
My idea was to add this later, as a library or something like that. So everything that's compiled along with a.bin can call it faster by using direct calls and things that are compiled seperately as eg. applications can use the table for slower but safer access to a.bin .

TLDR;
how to include labels from another asm file so that they can be called w/o including the actual code in the final assembled file?

Answer 1

You could proceed like this:

Assemble and link a.bin to be loaded from address 0x1000 .
Use the nm utility (or similar) to dump the symbol table of a.bin
Write a script to turn the symbol table into an assembly file asyms.asm that contains for each symbol in a.bin a line of the form
```
 sym EQU addr 
```
where addr is the actual address of sym as given by nm
Include or link asyms.asm when compiling b.bin . This makes the addresses of the symbols in a.bin visible to your assembler code without pulling in the corresponding code.

What you are trying to do is known as building an overlay. I believe some assemblers and linkers do have support for this sort of thing but I am not sure about the details.

Answer 2

You have a number of possibilities. This answer focuses on a hybrid of 1 and 2. Although you can create a table of function pointers, we can use direct calls to the routines in a common library by symbol name without copying the common library routines into each program. The method I use would be to utilize the power of LD and linker scripts to create a shared library that will have a static location in memory that is accessed via FAR CALLs (segment and offset form function address) from independent programs(s) loaded elsewhere in RAM.

Most people when they start out create a linker script that produces a copy of all the input sections in the output. It is possible to create output sections that never appear (not LOADed) in the output file but the linker can still use the symbols of those nonloaded sections to resolve symbol addresses.

I've created a simple common library with a print_banner and print_string function that use BIOS functions to print to the console. Both are assumed to be called via FAR CALL's from other segments. You may have your common library loaded at 0x0100:0x0000 (physical address 0x01000) but called from code in other segments like 0x2000:0x0000 (physical address 0x20000). A sample commlib.asm file could look like:

bits 16

extern __COMMONSEG
global print_string
global print_banner
global _startcomm

section .text

; Function: print_string
;           Display a string to the console on specified display page
; Type:     FAR
;
; Inputs:   ES:SI = Offset of address to print
;           BL = Display page
; Clobbers: AX, SI
; Return:   Nothing

print_string:               ; Routine: output string in SI to screen
    mov ah, 0x0e            ; BIOS tty Print
    jmp .getch
.repeat:
    int 0x10                ; print character
.getch:
    mov al, [es:si]         ; Get character from string
    inc si                  ; Advance pointer to next character
    test al,al              ; Have we reached end of string?
    jnz .repeat             ;     if not process next character
.end:
    retf                    ; Important: Far return

; Function: print_banner
;           Display a banner to the console to specified display page
; Type:     FAR
; Inputs:   BL = Display page
; Clobbers: AX, SI
; Return:   Nothing

print_banner:
    push es                 ; Save ES
    push cs
    pop es                  ; ES = CS
    mov si, bannermsg       ; SI = STring to print
                            ; Far call to print_string
    call __COMMONSEG:print_string
    pop es                  ; Restore ES
    retf                    ; Important: Far return

_startcomm:                 ; Keep linker quiet by defining this

section .data
bannermsg: db "Welcome to this Library!", 13, 10, 0

We need a linker script that allows us to create a file that we can eventually load into memory. This code assumes the segment the library will be loaded at is 0x0100 and offset 0x0000 (physical address 0x01000):

commlib.ld

OUTPUT_FORMAT("elf32-i386");
ENTRY(_startcomm);

/* Common Library at 0x0100:0x0000 = physical address 0x1000 */
__COMMONSEG    = 0x0100;
__COMMONOFFSET = 0x0000;

SECTIONS
{
    . = __COMMONOFFSET;

    /* Code and data for common library at VMA = __COMMONOFFSET */
    .commlib  : SUBALIGN(4) {
        *(.text)
        *(.rodata*)
        *(.data)
        *(.bss)
    }

    /* Remove unnecessary sections */
    /DISCARD/ : {
        *(.eh_frame);
        *(.comment);
    }
}

It is pretty simple, it effectively links a file commlib.o so that it can eventually be loaded at 0x0100:0x0000. As sample program that uses this library could look like:

prog.asm :

extern __COMMONSEG
extern print_banner
extern print_string
global _start

bits 16

section .text
_start:
    mov ax, cs                   ; DS=ES=CS
    mov ds, ax
    mov es, ax
    mov ss, ax                   ; SS:SP=CS:0x0000
    xor sp, sp

    xor bx, bx                   ; BL =  page 0 to display on
    call __COMMONSEG:print_banner; FAR Call
    mov si, mymsg                ; String to display ES:SI
    call __COMMONSEG:print_string; FAR Call

    cli
.endloop:
    hlt
    jmp .endloop

section .data
mymsg: db "Printing my own text!", 13, 10, 0

The trick now is to make a linker script that can take a program like this and reference the symbols in our common library without actually adding the common library code again. This can be achieved by using the NOLOAD type on an output section in a linker script.

prog.ld :

OUTPUT_FORMAT("elf32-i386");
ENTRY(_start);

__PROGOFFSET   = 0x0000;

/* Load the commlib.elf file to access all its symbols */
INPUT(commlib.elf)

SECTIONS
{
    /* NOLOAD type prevents the actual code from being loaded into memory
       which means if you create a BINARY file from this, this section will
       not appear */
    . = __COMMONOFFSET;
    .commlib (NOLOAD) : {
        commlib.elf(.commlib);
    }

    /* Code and data for program at VMA = __PROGOFFSET */
    . = __PROGOFFSET;
    .prog : SUBALIGN(4) {
        *(.text)
        *(.rodata*)
        *(.data)
        *(.bss)
    }

    /* Remove unnecessary sections */
    /DISCARD/ : {
        *(.eh_frame);
        *(.comment);
    }
}

The common library's ELF file is loaded by the linker and the .commlib section is marked with a (NOLOAD) type. This will prevent a final program from including the common library functions and data, but allows us to still reference the symbol addresses.

A simple test harness can be created as a bootloader. The bootloader will load the common library to 0x0100:0x0000 (physical address 0x01000), and the program that uses them is loaded to 0x2000:0x0000 (physical address 0x20000). The program address is arbitrary, I just picked it because it is in free memory below 1MB.

boot.asm :

org 0x7c00
bits 16

start:
    ; DL = boot drive number from BIOS

    ; Set up stack and segment registers
    xor ax, ax               ; DS = 0x0000
    mov ds, ax
    mov ss, ax               ; SS:SP=0x0000:0x7c00 below bootloader
    mov sp, 0x7c00
    cld                      ; Set direction flag forward for String instructions

    ; Reset drive
    xor ax, ax
    int 0x13

    ; Read 2nd sector (commlib.bin) to 0x0100:0x0000 = phys addr 0x01000
    mov ah, 0x02             ; Drive READ subfunction
    mov al, 0x01             ; Read one sector
    mov bx, 0x0100
    mov es, bx               ; ES=0x0100
    xor bx, bx               ; ES:BS = 0x0100:0x0000 = phys adress 0x01000
    mov cx, 0x0002           ; CH = Cylinder = 0, CL = Sector # = 2
    xor dh, dh               ; DH = Head = 0
    int 0x13

    ; Read 3rd sector (prog.bin) to 0x2000:0x0000 = phys addr 0x20000
    mov ah, 0x02             ; Drive READ subfunction
    mov al, 0x01             ; Read one sector
    mov bx, 0x2000
    mov es, bx               ; ES=0x2000
    xor bx, bx               ; ES:BS = 0x2000:0x0000 = phys adress 0x20000
    mov cx, 0x0003           ; CH = Cylinder = 0, CL = Sector # = 2
    xor dh, dh               ; DH = Head = 0
    int 0x13

    ; Jump to the entry point of our program
    jmp 0x2000:0x0000

    times 510-($-$$) db 0
    dw 0xaa55

After the bootloader loads the common library (sector 1) and program (sector 2) into memory it jumps to the entry point of the program at 0x2000:0x0000.

Putting it All Together

We can create the file commlib.bin with:

nasm -f elf32 commlib.asm -o commlib.o
ld -melf_i386 -nostdlib -nostartfiles -T commlib.ld -o commlib.elf commlib.o
objcopy -O binary commlib.elf commlib.bin

commlib.elf is also created as an intermediate file. You can create prog.bin with:

nasm -f elf32 prog.asm -o prog.o
ld -melf_i386 -nostdlib -nostartfiles -T prog.ld -o prog.elf prog.o
objcopy -O binary prog.elf prog.bin

Create the bootloader ( boot.bin ) with:

nasm -f bin boot.asm -o boot.bin

We can build a disk image ( disk.img ) that looks like a 1.44MB floppy with:

dd if=/dev/zero of=disk.img bs=1024 count=1440
dd if=boot.bin of=disk.img bs=512 seek=0 conv=notrunc
dd if=commlib.bin of=disk.img bs=512 seek=1 conv=notrunc
dd if=prog.bin of=disk.img bs=512 seek=2 conv=notrunc

This simple example can fit the common library and program in single sectors. I have also hard coded their locations on the disk. This is just a proof of concept, and not meant to represent your final code.

When I run this in QEMU (BOCHS will also work) using qemu-system-i386 -fda disk.img I get this output:

Looking at prog.bin

In the example above we created a prog.bin file that wasn't suppose to have the common library code in it, but had symbols to it resolved. Is that what happened? If you use NDISASM you can disassemble the binary file as 16-bit code with an origin point of 0x0000 to see what was generated. Using ndisasm -o 0x0000 -b16 prog.bin you should see something like:

 ; Text Section 00000000 8CC8 mov ax,cs 00000002 8ED8 mov ds,ax 00000004 8EC0 mov es,ax 00000006 8ED0 mov ss,ax 00000008 31E4 xor sp,sp 0000000A 31DB xor bx,bx ; Both the calls are to the function in the common library that are loaded ; in a different segment at 0x0100. The linker was able to resolve these ; locations for us. 0000000C 9A14000001 call word 0x100:0x11 ; FAR Call print_banner 00000011 BE2000 mov si,0x20 00000014 9A00000001 call word 0x100:0x0 ; FAR Call print_string 00000019 FA cli 0000001A F4 hlt 0000001B EBFD jmp short 0x1a ; Infinite loop 0000001D 6690 xchg eax,eax 0000001F 90 nop ; Data section ; String 'Printing my own text!', 13, 10, 0 00000020 50 push ax 00000021 7269 jc 0x8c 00000023 6E outsb 00000024 7469 jz 0x8f 00000026 6E outsb 00000027 67206D79 and [ebp+0x79],ch 0000002B 206F77 and [bx+0x77],ch 0000002E 6E outsb 0000002F 207465 and [si+0x65],dh 00000032 7874 js 0xa8 00000034 210D and [di],cx 00000036 0A00 or al,[bx+si]

I have annotated it with a few comments.

Notes

Is it required to use FAR Calls? No, but if you don't then all of your code will have to fit in a single segment and the offsets won't be able to overlap. Using FAR Calls comes with some overhead but they are more flexible allowing you to better utilize memory below 1MB. Functions called via a FAR Call have to use FAR Returns ( retf ). Far functions that use pointers passed from other segments generally need to handle segment and offset of pointers (FAR pointers), not just the offset.
Using the method in this answer: anytime you make a change to the common library you have to re-link all the programs that rely on it, as the absolute memory addresses for exported (public) functions and data may shift.

Referencing symbols of code/data loaded separately to another part of memory

Question

2 answers

solution1
3 2018-03-23 14:41:19

solution2
3 ACCPTED 2018-03-23 22:02:38

Putting it All Together

Looking at prog.bin

Notes

Referencing symbols of code/data loaded separately to another part of memory

Question

2 answers

solution1 3 2018-03-23 14:41:19

solution2 3 ACCPTED 2018-03-23 22:02:38

Putting it All Together

Looking at prog.bin

Notes

solution1
3 2018-03-23 14:41:19

solution2
3 ACCPTED 2018-03-23 22:02:38