简体   繁体   中英

What are the segment and offset in real mode memory addressing?

I am reading about memory addressing. I read about segment offset and then about descriptor offset. I know how to calculate the exact addresses in real mode. All this is OK, but I am unable to understand what exactly offset is? Everywhere I read:

In order to allow addressing of more memory, addresses are calculated from segment * 16 + offset<\/code> .

We have 16 bits, so we can address up to 2^16 = 64k.

What the segment represent? Why we multiply it with 16? why we add offset. I just can't understand what this offset is? Can anybody explain me or give me link for this please?

When Intel was building the 8086, there was a valid case for having more than 64KB in a machine, but there was no way it'd ever use a 32-bit address space. Back then, even a megabyte was a whole lot of memory. (Remember the infamous quote "640K ought to be enough for anybody"? It's essentially a mistranslation of the fact that back then, 1MB was freaking huge<\/em> .) The word "gigabyte" wouldn't be in common use for another 15-20 years, and it wouldn't be referring to RAM for like another 5-10 years after that.

They still used 16-bit words for addresses, because after all, this is a 16-bit processor. The upper word was the "segment" and the lower word was the "offset". The two parts overlapped considerably, though -- a "segment" is a 64KB chunk of memory that starts at (segment) * 16<\/code> , and the "offset" can point anywhere within that chunk. In order to calculate the actual address, you multiply the segment part of the address by 16 (or shift it left by 4 bits...same thing), and then add the offset. When you're done, you have a 20-bit address.

 19           4  0
  +--+--+--+--+
  |  segment  |
  +--+--+--+--+--+
     |   offset  |
     +--+--+--+--+

Under x86 Real-Mode Memory the physical address is 20 bit long and is therefore calculated as:

PhysicalAddress = Segment * 16 + Offset

I want to add an answer here just because I've been scouring the internet trying to understand this too. The other answers were leaving out a key piece of information that I did get from the link presented in one of the answers. However, I almost totally missed it. Reading through the linked page, I still wasn't understanding how this was working.

The problem I was probably having was from myself only really understanding how the Commodore 64 (6502 processor) laid out memory. It uses similar notation to address memory. It has 64k of total memory, and uses 8-bit values of PAGE:OFFSET to access memory. Each page is 256 bytes long (8-bit number) and the offset points to one of values in that page. Pages are spaced back-to-back in memory. So page 2 starts where page 1 ends. I was going into the 386 thinking the same style. This is not so.

Real mode is using a similar style even if it is different wording SEGMENT:OFFSET. A segment is 64k in size. However, the segments themselves are not laid out back-to-back like the Commodore was. They are spaced 16 bytes apart from each other. Offset still operates the same, indicating how many bytes from the page\\segment start.

I hope this explanation helps anyone else who finds this question, it has helped me in writing it.

I can see the question and answers are some years old, but there is a wrong statement that there are only 16 bit registers exist within the real mode.

Every of these 8 bit register is a part of a 16 bit register which are divided into a lower and a higher part of a 16 bit register.

Within the real mode the default operand-size and address-size is 16 bit. With these both instruction prefixes we can use a 32 bit operand\/register example for to calculate a 32 bit value in one 32 bit register, or for to move a 32 bit value to and from a memmory location. And we can use all 32 bit registers(maybe in combination with a base+index*scale+displacement) as an address-register, but the sum of the effective address do not have to be exceed the limit of the 64 kb segment-size.


But this is totaly wrong, because in the Intel manual we can find this statement: "These prefixes can be used in real-address mode as well as in protected mode and virtual-8086 mode".)


Starting with a Pentium 3 we become eight 128 bit XMM-Registers.
..

Minimal example

With:

  • offset = msg
  • segment = ds
mov $0, %ax
mov %ax, %ds
mov %ds:msg, %al
/* %al contains 1 */

mov $1, %ax
mov %ax, %ds
mov %ds:msg, %al
/* %al contains 2: 1 * 16 bytes forward. */

msg:
.byte 1
.fill 15
.byte 2

So if you want to access memory above 64k:

mov $0xF000, %ax
mov %ax, %ds

Note that this allows for addresses larger than 20 bits wide if you use something like:

0x10 * 0xFFFF + 0xFFFF == 0x10FFEF

On earlier processors which had only 20 address wires, it was simply truncated, but later on things got complicated with the A20 line (21st address wire): https://en.wikipedia.org/wiki/A20_line

On a GitHub repo with the required boilerplate to run it.

A 16-bit register can only address up to 0xFFFF (65,536 bytes, 64KB). When that wasn't enough, Intel added segment registers.

Any logical design would have simply combined two 16-bit registers to make a 32-bit address space, (eg 0xFFFF : 0xFFFF = 0xFFFFFFFF ), but nooooo ... Intel had to get all weird on us.

Historically, the frontside bus (FSB) only had 20 address lines, and thus could only transmit 20-bit addresses. To "rectify" this, Intel devised a scheme in which segment registers only extend your address by 4-bits (16bits + 4 = 20, in theory).

To achieve this, the segment register is left-shifted from its original value by 4-bits, then added to the address in your general register (eg [es:ax] = ( es << 4 ) + ax ) . Note: Left shifting 4 bits is equivalent to multiplying by 16 .

That's it. Here's some illustrative examples:

;; everything's hexadecimal

[ 0:1 ] = 1

[ F:1 ] = F1

[ F:0 ] = F0

[ F:FF] = 1EF ; [F becomes F0, + FF = 1EF]

[ F000 : FFFF ] = FFFFF (max 20-bit number)

[ FFFF : FFFF ] = 10FFEF (oh shit, 21-bit number!)

So, you can still address more than 20-bits. What happens? The address "wraps around", like modulus arithmetic (as a natural consequence of the hardware). So, 0x10FFEF becomes 0xFFEF .

And there you have it! Intel hired some dumb engineers, and we have to live with it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM