简体繁体中英

Why is memory alignment needed?

原文 2017-10-28 15:35:08 2 1 assembly/ memory/ x86/ memory-alignment/ low-level

I know that this question has been asked a thousand times and I have read through every single answer and I still don't get it. Probably there is some fundamental error in my model of RAM which makes me unable to comprehend any answers.

I get all these little information thrown at from all around the internet, but I just can't connect them.

Here is what I think to know so far: Take the IA-32 Architecture for example, has a word boundary of 32 bits (boundary = the maximum the CPU can read from the Memory?). It will always read in its word boundary.

1) So, whatever address I give it, it will always read 4 bytes? What if I have a simple char at address x. Will it read 4 bytes from that address and then do something weird to only get the one byte?

2) If that is so, then is a string (a sequence of char) n_chars * 4 Bytes big? I'm pretty sure it isn't that way, but how am I supposed to interpret "will always read its word boundary" then?

3) Memory alignment seems to only come up with Data structures. Why? Is Memory unaligned in the rest of the memory? And I mean for Physical, Virtual, Kernel Space etc?

4) Why can I store a 32 Bit value only at addresses dividable by 4? I mean I get that it will eventually read only 32 bits, but why can it not read 32 bits from an odd address? Like what is the restriction here?

I'm just so confused please help me

1 answers

In modern computers, memory is byte oriented. Each byte has its own address and can be fetched from RAM individually. For the sake of your program, you can assume that fetching a word behaves like fetching the bytes that make it up in an arbitrary order and then assembling them to a word in the register you load to.

Note that this is an abstraction. The memory chips is usually wired up in a way that 8 or more bytes are fetched at once. The CPU has some circuitry to abstract all of this away from the machine code. However, this abstraction is leaky which causes a number of effects:

if a datum is not aligned to its alignment requirement, memory access can take extra cycles because the datum spans more words than necessary. This penalty is avoided by aligning data sufficiently.
When fetching or writing an aligned datum, this translate into a single fetch or store in the hardware. Such a fetch or store is atomic which is an important property in concurrent code. When fetching or writing unaligned data, more than one fetch or store is needed and the operation is no longer atomic.
Some CPUs do not support reading/writing unaligned memory at all as this simplifies circuit design. This restriction has becomen increasingly rare in contemporary hardware.

So now, for your questions:

1) So, whatever address I give it, it will always read 4 bytes? What if I have a simple char at address x. Will it read 4 bytes from that address and then do something weird to only get the one byte?

Maybe. This depends on the hardware you use. But yes, you are going to get only one byte if you requested one byte. You shouldn't be concerned with how many bytes the hardware reads to give you that one byte.

2) If that is so, then is a string (a sequence of char) n_chars * 4 Bytes big? I'm pretty sure it isn't that way, but how am I supposed to interpret "will always read its word boundary" then?

A string is normally n_chars bytes big. When you read one char from the string, you get one byte. The hardware might read more bytes to fulfill your request but that's not something you need to care about. Note that Windows some times uses UTF-16 strings which occupy two bytes per character, but this trend hasn't really caught on.

3) Memory alignment seems to only come up with Data structures. Why? Is Memory unaligned in the rest of the memory? And I mean for Physical, Virtual, Kernel Space etc?

Memory alignment matters whenever you consider data in RAM. It doesn't matter if that memory is used inside the kernel or your user process. The MMU generally maps memory in a way that preserves alignment so it doesn't matter if you use physical or virtual memory. Data on disk doesn't have these alignment requirements but other performance characteristics might apply due to the sector size of the storage you use.

4) Why can I store a 32 Bit value only at addresses dividable by 4? I mean I get that it will eventually read only 32 bits, but why can it not read 32 bits from an odd address? Like what is the restriction here?

If you read 32 bits from an odd address, one of the following things happens depending on your CPU and operating system:

It just works
It works but is a little bit slower
The CPU silently ignores the low 2 bits and reads from the corresponding aligned address instead (this is rare nowadays)
The CPU throws an exception which crashes your program if you don't handle it
The CPU throws an exception which the operating system catches to emulate the memory access for you.

You generally shouldn't assume which of these happens. Never write code that reads unaligned data. If you need to read unaligned data, consider reading each byte on its own and then manually reassemble the bytes into the datum you want.

Why does GCC allocate more space than necessary on the stack, beyond what's needed for alignment?

Memory Address Alignment

Why a bubble is needed?

G++ SSE memory alignment on the stack

Memory alignment today and 20 years ago

Word alignment works only in case of byte addressable memory?

GCC allocates more stack space than needed for locals, even without alignment. What's it using the space for?

Why gcc generates a PLT when it is apparently not needed?

Why and where align 16 is used for SSE alignment for instructions?

Why linux nasm working even WITHOUT 16 bytes stack alignment

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Why does GCC allocate more space than necessary on the stack, beyond what's needed for alignment? Memory Address Alignment Why a bubble is needed? G++ SSE memory alignment on the stack Memory alignment today and 20 years ago Word alignment works only in case of byte addressable memory? GCC allocates more stack space than needed for locals, even without alignment. What's it using the space for? Why gcc generates a PLT when it is apparently not needed? Why and where align 16 is used for SSE alignment for instructions? Why linux nasm working even WITHOUT 16 bytes stack alignment

Related Tags

Why is memory alignment needed?

Question

1 answers

solution1 7 ACCPTED 2017-10-28 20:15:09

solution1
7 ACCPTED 2017-10-28 20:15:09