简体   繁体   中英

C dummy struct, strict aliasing and static initialization

My first question wasn't well formulated so here goes again, this time, more well asked and explained.

I want to hide the variables of a struct while being able to initialize the struct statically on the stack. Most solutions out there use the opaque pointer idiom and dynamic memory allocation which isn't always desired.

The idea for this example came from the following post:

https://www.reddit.com/r/C_Programming/comments/aimgei/opaque_types_and_static_allocation/

I know that this is probably ub but I believe it should work fine in most consumers archictures: either 32 bit or 64 bit.

Now you may tell me that sometimes size_t may be bigger than void * and that the void * alignment in the union forcing the union alignment to be that of sizeof(void *) may be wrong, but usually that's never case, maybe it can happen but I see it as the exception not the rule.

Based on the fact that most compilers add padding to align it to either a multiple of 4 or 8 depending on your architecture and that sizeof returns the correct size with padding, sizeof(Vector) and sizeof(RealVector) should be the same, and based on the fact that both Vector and RealVector have the same alignment it should be fine too.

If this is ub , how can I create a sort of scratchpad structure in C in a safe maner? In C++ we have alignas , alignof and placement new which hepls making this ordeal a lot more safer.

If that's not possible to do in C99 , will it be more safer in C11 with alignas and alignof ?

#include <stdint.h>
#include <stdio.h>

/* In .h */

typedef union Vector {
    uint8_t data[sizeof(void *) + 2 * sizeof(size_t)];
    /* this is here to the force the alignment of the union to that of sizeof(void *) */
    void * alignment;
} Vector;

void vector_initialize_version_a(Vector *);
void vector_initialize_version_b(Vector *);
void vector_debug(Vector const *);

/* In .c */

typedef struct RealVector {
    uint64_t * data;
    size_t length;
    size_t capacity;
} RealVector;

void
vector_initialize_version_a(Vector * const t) {
    RealVector * const v = (RealVector *)t;
    v->data = NULL;
    v->length = 0;
    v->capacity = 8;
}

void
vector_initialize_version_b(Vector * const t) {
    *(RealVector *)t = (RealVector) {
        .data = NULL,
        .length = 0,
        .capacity = 16,
    };
}

void
vector_debug(Vector const * const t) {
    RealVector * v = (RealVector *)t;
    printf("Length: %zu\n", v->length);
    printf("Capacity: %zu\n", v->capacity);
}

/* In main.c */

int
main() {
    /*
    Compiled with:
    clang -std=c99 -O3 -Wall -Werror -Wextra -Wpedantic test.c -o main.exe
    */

    printf("%zu == %zu\n", sizeof(Vector), sizeof(RealVector));

    Vector vector;

    vector_initialize_version_a(&vector);
    vector_debug(&vector);

    vector_initialize_version_b(&vector);
    vector_debug(&vector);

    return 0;
}

Why nor simple? It avoids the pointer punning

typedef struct RealVector {
    uint64_t * data;
    size_t length;
    size_t capacity;
} RealVector;

typedef struct Vector {
    uint8_t data[sizeof(RealVector)];
} Vector;

typedef union
{
    Vector      v;
    RealVector rv;
} RealVector_union;

void vector_initialize_version_a(void * const t) {
    RealVector_union * const v = t;
    v -> rv.data = NULL;
    v -> rv.length = 0;
    v -> rv.capacity = 8;
}

And

I'll post my answer from the previous question, which I didn't have to time to post:)

Am I safe doing this?

No, you are not. But instead of finding a way of doing it safe, just error when it's not safe:

#include <assert.h>
#include <stdalign.h>
static_assert(sizeof(Vector) == sizeof(RealVector), "");
static_assert(alignof(Vector) == alignof(RealVector), "");

With checks written in that way, you will know beforehand when there's going to be a problem, and you can then fix it handling the specific environment. And if the checks will not fire, you will know it's fine.

how can I create a sort of scratchpad structure in C in a safe maner?

The only correct way of really doing it safe would be a two step process:

  • first compile a test executable that would output the size and alignment of struct RealVector
  • then generate the header file with proper structure definition struct Vector { alignas(REAL_VECTOR_ALIGNMENT) unigned char data[REAL_VECTOR_SIZE]; }; struct Vector { alignas(REAL_VECTOR_ALIGNMENT) unigned char data[REAL_VECTOR_SIZE]; };
  • and then continue to compiling the final executable
  • Compilation of test and final executables has to be done using the same compiler options, version and settings and environment.

Notes:

  • Instead of union use struct with alignof
  • uint8_t is an integer with 8-bits. Use char , or best unsigned char , to represent "byte".
  • sizeof(void*) is not guaranteed to be sizeof(uint64_t*)
  • where max alignment is either 4 or 8 - typically on x86_64 alignof(long double) is 16.

One possibility is to define Vector as follows in the.h file:

/* In vector.h file */
struct RealVector {
    uint64_t * data;
    size_t length;
    size_t capacity;
};

typedef union Vector {
    char data[sizeof(struct RealVector)];
    /* these are here to the force the alignment of the union */
    uint64_t * alignment1_;
    size_t alignment2_;
} Vector;

That also defines struct RealVector for use in the vector implementation.c file:

/* In vector.c file */
typedef struct RealVector RealVector;

This has the advantage that the binary contents of Vector actually consists of a RealVector and is correctly aligned. The disadvantage is that a sneaky user could easily manipulate the contents of a Vector via pointer type casting.

A not so legitimate alternative is to remove struct RealVector from the.h file and replace it with an anonymous struct type of the same shape:

/* In vector.h file */
typedef union Vector {
    char data[sizeof(struct { uint64_t * a; size_t b; size_t c; })];
    /* these are here to the force the alignment of the union */
    uint64_t * alignment1_;
    size_t alignment2_;
} Vector;

Then struct RealVector needs to be fully defined in the vector implementation.c file:

/* In vector.c file */
typedef struct RealVector {
    uint64_t * data;
    size_t length;
    size_t capacity;
} RealVector;

This has the advantage that a sneaky user cannot easily manipulate the contents of a Vector without first defining another struct type of the same shape as the anonymous struct type. The disadvantage is that the anonymous struct type that forms the binary representation of Vector is not technically compatible with the RealVector type used in the vector implementation.c file because the tags and member names are different.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM