简体   繁体   中英

C: how does dot/arrow operator work under the hood?

From what I understand, given

int *x = malloc(10 * sizeof(int));
x[5] = 13;

malloc just allocates empty space (with no assumption about the object that will be put there), and x[5] translates to *(x + 5) which is treated as an integer. So, it is left to the [] operator to create the illusion of an array.

But what happens in the following case?

struct test {
    int a;
    char b;
};

struct test* x = malloc(sizeof(struct test));
x->a = 3;
x->b = 'a';

Do the x->a, x->b translate to some memory position in some regular way, like the [i] operator does? Does the C reference state anything, or is it implementation specific? I've been looking through various books, but, contrary to arrays, structs are always presented as black box.

Let's say an int is 4 bytes and a char is 1 byte (I don't know those numbers by heart but let's say this is correct). Then the struct test would be 5 consecutive bytes in the memory (first a (4 bytes) and then b (1 byte)).

If you then call test->b , then you are pointing to the start of that struct plus an offset of 4 bytes. (since test is a pointer, ->a kind of means +0 and ->b kind of means +4 )

malloc just allocates empty space (with no assumption about the object that will be put there)

Correct. Dynamically allocated memory specifically, has no type until the point where you write something to that area. Formally the C language calls this the effective type . The formal definition is found in C17 6.5/7:

The effective type of an object for an access to its stored value is the declared type of the object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.

What's returned from malloc is just a raw chunk of memory, with no special attributes, until the point where you write to that area. After which the compiler has to put a "type label" on it internally. As soon as you access it by using [], the compiler will have to assume that the data allocated has to be treated as an array, to keep the type system consistent between statically allocated and dynamically allocated objects.

Similarly, the memory area becomes a struct at the point when you access the memory, as it will have padding etc and dictate the memory offset of each member. So if given a struct with opposite order of your example, like this:

struct test {
    char a;
    int  b;
};

Then it is implementation-defined if x->b will result in access to byte 1, byte 4 or something else, since the compiler is free to add padding between the members.

But as soon as you access x->something , the compiler will have to start regarding whatever x points at as effective type struct test , or the type system wouldn't behave consistently.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM