I have two nasm-syntax assembly files, let's say a.asm
and b.asm
.
They will need to be assembled into two seperate binary files, a.bin
and b.bin
.
On startup a.bin
will be loaded by another program to a fixed location in memory ( 0x1000
).
b.bin
will be loaded later to an arbitrary location in memory.
b.bin
will use some of the functions defined in a.bin
.
PROBLEM: b.bin
does not know where the functions are located in a.bin
Why do they need to be seperate? They're unrelated, keeping b.bin
(and many more files) and a.bin
in one file would defeat the purpose of a file system.
Why not %include
it? Memory usage, a.bin
is quite a large set of functions taking up lots of memory, and because of the 640kb memory limit in x86 real mode i can't really afford to have this in memory for every file that needs it.
possible solution 1: just hardcode the locations.
problem: what if i change something minor at the very start of a.bin
? I'll need to update all pointers to stuff after it, and that's not handy.
possible solution 2: keep track of the function locations in one file, and %include
that.
This is probably what i'll do if i have no other options. I might even be able to automatically generate this file if nasm can generate easy-to-parse symbol listings, otherwise it's still too much work.
possible solution 3: keep a table in memory of where the functions are located, instead of the functions themselves. This also has the added benefit of backwards compatibility, if i do decide to change a.bin
, all things using it don't have to change along with it.
problem: indirect call is really slow and takes up lot's of disk space, though really this is a minor issue. The table will also take up some space in disk and memory though.
My idea was to add this later, as a library or something like that. So everything that's compiled along with a.bin
can call it faster by using direct calls and things that are compiled seperately as eg. applications can use the table for slower but safer access to a.bin
.
TLDR;
how to include labels from another asm file so that they can be called w/o including the actual code in the final assembled file?
You could proceed like this:
a.bin
to be loaded from address 0x1000
. nm
utility (or similar) to dump the symbol table of a.bin
Write a script to turn the symbol table into an assembly file asyms.asm
that contains for each symbol in a.bin
a line of the form
sym EQU addr
where addr
is the actual address of sym
as given by nm
asyms.asm
when compiling b.bin
. This makes the addresses of the symbols in a.bin
visible to your assembler code without pulling in the corresponding code. What you are trying to do is known as building an overlay. I believe some assemblers and linkers do have support for this sort of thing but I am not sure about the details.
You have a number of possibilities. This answer focuses on a hybrid of 1 and 2. Although you can create a table of function pointers, we can use direct calls to the routines in a common library by symbol name without copying the common library routines into each program. The method I use would be to utilize the power of LD and linker scripts to create a shared library that will have a static location in memory that is accessed via FAR CALLs (segment and offset form function address) from independent programs(s) loaded elsewhere in RAM.
Most people when they start out create a linker script that produces a copy of all the input sections in the output. It is possible to create output sections that never appear (not LOADed) in the output file but the linker can still use the symbols of those nonloaded sections to resolve symbol addresses.
I've created a simple common library with a print_banner
and print_string
function that use BIOS functions to print to the console. Both are assumed to be called via FAR CALL's from other segments. You may have your common library loaded at 0x0100:0x0000 (physical address 0x01000) but called from code in other segments like 0x2000:0x0000 (physical address 0x20000). A sample commlib.asm file could look like:
bits 16
extern __COMMONSEG
global print_string
global print_banner
global _startcomm
section .text
; Function: print_string
; Display a string to the console on specified display page
; Type: FAR
;
; Inputs: ES:SI = Offset of address to print
; BL = Display page
; Clobbers: AX, SI
; Return: Nothing
print_string: ; Routine: output string in SI to screen
mov ah, 0x0e ; BIOS tty Print
jmp .getch
.repeat:
int 0x10 ; print character
.getch:
mov al, [es:si] ; Get character from string
inc si ; Advance pointer to next character
test al,al ; Have we reached end of string?
jnz .repeat ; if not process next character
.end:
retf ; Important: Far return
; Function: print_banner
; Display a banner to the console to specified display page
; Type: FAR
; Inputs: BL = Display page
; Clobbers: AX, SI
; Return: Nothing
print_banner:
push es ; Save ES
push cs
pop es ; ES = CS
mov si, bannermsg ; SI = STring to print
; Far call to print_string
call __COMMONSEG:print_string
pop es ; Restore ES
retf ; Important: Far return
_startcomm: ; Keep linker quiet by defining this
section .data
bannermsg: db "Welcome to this Library!", 13, 10, 0
We need a linker script that allows us to create a file that we can eventually load into memory. This code assumes the segment the library will be loaded at is 0x0100 and offset 0x0000 (physical address 0x01000):
commlib.ld
OUTPUT_FORMAT("elf32-i386");
ENTRY(_startcomm);
/* Common Library at 0x0100:0x0000 = physical address 0x1000 */
__COMMONSEG = 0x0100;
__COMMONOFFSET = 0x0000;
SECTIONS
{
. = __COMMONOFFSET;
/* Code and data for common library at VMA = __COMMONOFFSET */
.commlib : SUBALIGN(4) {
*(.text)
*(.rodata*)
*(.data)
*(.bss)
}
/* Remove unnecessary sections */
/DISCARD/ : {
*(.eh_frame);
*(.comment);
}
}
It is pretty simple, it effectively links a file commlib.o
so that it can eventually be loaded at 0x0100:0x0000. As sample program that uses this library could look like:
prog.asm :
extern __COMMONSEG
extern print_banner
extern print_string
global _start
bits 16
section .text
_start:
mov ax, cs ; DS=ES=CS
mov ds, ax
mov es, ax
mov ss, ax ; SS:SP=CS:0x0000
xor sp, sp
xor bx, bx ; BL = page 0 to display on
call __COMMONSEG:print_banner; FAR Call
mov si, mymsg ; String to display ES:SI
call __COMMONSEG:print_string; FAR Call
cli
.endloop:
hlt
jmp .endloop
section .data
mymsg: db "Printing my own text!", 13, 10, 0
The trick now is to make a linker script that can take a program like this and reference the symbols in our common library without actually adding the common library code again. This can be achieved by using the NOLOAD
type on an output section in a linker script.
prog.ld :
OUTPUT_FORMAT("elf32-i386");
ENTRY(_start);
__PROGOFFSET = 0x0000;
/* Load the commlib.elf file to access all its symbols */
INPUT(commlib.elf)
SECTIONS
{
/* NOLOAD type prevents the actual code from being loaded into memory
which means if you create a BINARY file from this, this section will
not appear */
. = __COMMONOFFSET;
.commlib (NOLOAD) : {
commlib.elf(.commlib);
}
/* Code and data for program at VMA = __PROGOFFSET */
. = __PROGOFFSET;
.prog : SUBALIGN(4) {
*(.text)
*(.rodata*)
*(.data)
*(.bss)
}
/* Remove unnecessary sections */
/DISCARD/ : {
*(.eh_frame);
*(.comment);
}
}
The common library's ELF file is loaded by the linker and the .commlib
section is marked with a (NOLOAD)
type. This will prevent a final program from including the common library functions and data, but allows us to still reference the symbol addresses.
A simple test harness can be created as a bootloader. The bootloader will load the common library to 0x0100:0x0000 (physical address 0x01000), and the program that uses them is loaded to 0x2000:0x0000 (physical address 0x20000). The program address is arbitrary, I just picked it because it is in free memory below 1MB.
boot.asm :
org 0x7c00
bits 16
start:
; DL = boot drive number from BIOS
; Set up stack and segment registers
xor ax, ax ; DS = 0x0000
mov ds, ax
mov ss, ax ; SS:SP=0x0000:0x7c00 below bootloader
mov sp, 0x7c00
cld ; Set direction flag forward for String instructions
; Reset drive
xor ax, ax
int 0x13
; Read 2nd sector (commlib.bin) to 0x0100:0x0000 = phys addr 0x01000
mov ah, 0x02 ; Drive READ subfunction
mov al, 0x01 ; Read one sector
mov bx, 0x0100
mov es, bx ; ES=0x0100
xor bx, bx ; ES:BS = 0x0100:0x0000 = phys adress 0x01000
mov cx, 0x0002 ; CH = Cylinder = 0, CL = Sector # = 2
xor dh, dh ; DH = Head = 0
int 0x13
; Read 3rd sector (prog.bin) to 0x2000:0x0000 = phys addr 0x20000
mov ah, 0x02 ; Drive READ subfunction
mov al, 0x01 ; Read one sector
mov bx, 0x2000
mov es, bx ; ES=0x2000
xor bx, bx ; ES:BS = 0x2000:0x0000 = phys adress 0x20000
mov cx, 0x0003 ; CH = Cylinder = 0, CL = Sector # = 2
xor dh, dh ; DH = Head = 0
int 0x13
; Jump to the entry point of our program
jmp 0x2000:0x0000
times 510-($-$$) db 0
dw 0xaa55
After the bootloader loads the common library (sector 1) and program (sector 2) into memory it jumps to the entry point of the program at 0x2000:0x0000.
We can create the file commlib.bin
with:
nasm -f elf32 commlib.asm -o commlib.o
ld -melf_i386 -nostdlib -nostartfiles -T commlib.ld -o commlib.elf commlib.o
objcopy -O binary commlib.elf commlib.bin
commlib.elf
is also created as an intermediate file. You can create prog.bin
with:
nasm -f elf32 prog.asm -o prog.o
ld -melf_i386 -nostdlib -nostartfiles -T prog.ld -o prog.elf prog.o
objcopy -O binary prog.elf prog.bin
Create the bootloader ( boot.bin
) with:
nasm -f bin boot.asm -o boot.bin
We can build a disk image ( disk.img
) that looks like a 1.44MB floppy with:
dd if=/dev/zero of=disk.img bs=1024 count=1440
dd if=boot.bin of=disk.img bs=512 seek=0 conv=notrunc
dd if=commlib.bin of=disk.img bs=512 seek=1 conv=notrunc
dd if=prog.bin of=disk.img bs=512 seek=2 conv=notrunc
This simple example can fit the common library and program in single sectors. I have also hard coded their locations on the disk. This is just a proof of concept, and not meant to represent your final code.
When I run this in QEMU (BOCHS will also work) using qemu-system-i386 -fda disk.img
I get this output:
In the example above we created a prog.bin
file that wasn't suppose to have the common library code in it, but had symbols to it resolved. Is that what happened? If you use NDISASM you can disassemble the binary file as 16-bit code with an origin point of 0x0000 to see what was generated. Using ndisasm -o 0x0000 -b16 prog.bin
you should see something like:
; Text Section 00000000 8CC8 mov ax,cs 00000002 8ED8 mov ds,ax 00000004 8EC0 mov es,ax 00000006 8ED0 mov ss,ax 00000008 31E4 xor sp,sp 0000000A 31DB xor bx,bx ; Both the calls are to the function in the common library that are loaded ; in a different segment at 0x0100. The linker was able to resolve these ; locations for us. 0000000C 9A14000001 call word 0x100:0x11 ; FAR Call print_banner 00000011 BE2000 mov si,0x20 00000014 9A00000001 call word 0x100:0x0 ; FAR Call print_string 00000019 FA cli 0000001A F4 hlt 0000001B EBFD jmp short 0x1a ; Infinite loop 0000001D 6690 xchg eax,eax 0000001F 90 nop ; Data section ; String 'Printing my own text!', 13, 10, 0 00000020 50 push ax 00000021 7269 jc 0x8c 00000023 6E outsb 00000024 7469 jz 0x8f 00000026 6E outsb 00000027 67206D79 and [ebp+0x79],ch 0000002B 206F77 and [bx+0x77],ch 0000002E 6E outsb 0000002F 207465 and [si+0x65],dh 00000032 7874 js 0xa8 00000034 210D and [di],cx 00000036 0A00 or al,[bx+si]
I have annotated it with a few comments.
retf
). Far functions that use pointers passed from other segments generally need to handle segment and offset of pointers (FAR pointers), not just the offset.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.