简体   繁体   中英

Efficiency of structs of length 8, and uint64_t

TLDR: are 8-byte structures handled just as efficiently as 8-byte uint64_t?

I have 3 data structures that are very similar. They are 6, 7 and 8 bytes long. I want to put them all in uint64_t variables. The goal is that comparisons and assignments will be very efficient. (These values are used as key in several (large) trees).

Example: I have defined the following data-structure, for the one that is 7 bytes long.

typedef struct {
  union {
    uint64_t raw;
    struct {
      uint8_t unused;
      uint8_t node_number;
      uint8_t systemid[SYSTEMID_LENGTH]; /* SYSTEMID_LENGTH is 6 bytes. */
    } nodeid;
  };
} nodeid_t;

I can now do quick assignments and copies via the raw union member.

nodeid_t x, y;

x.raw = y.raw
if (x.raw > y.raw) {

Etc, etc.

My question is about use in functions and in assignments. When I pass this struct by value, will the compiler (gcc) recognize that these structures are 8 bytes long. And therefore treat as if they are int64_t?

Example: Will there be efficiency/performance differences between:

int64_t my_function();
nodeid_t my_function();

In other words, will gcc use 1 instruction to put a nodeid_t on the stack, as if it was an integer? Or will it create a loop, and copy 8 bytes one by one? Does this depend on -O optimization?

Same question for assignment.

int64_t a, b;
nodeid_t x, y;

a = b; /* One machine instruction, I hope. */
x = y; /* Also one instruction, or will it do a loop ? */

You cannot be certain that the union has the same size as uint64_t .

This is due to packing in the nodeid struct : compilers will often insert gaps between struct members. Some compilers allow you to change the packing arrangements but then your code will not be portable.

It would be safer to have an array of uint8_t : then the memory would be contiguous.

A compiler will then simply copy the memory on assignment, so you may as well use nodeid_t as your function return types.

Your second job is to rename nodeid_t : _t suffixes are reserved in POSIX C.

Portability aside, if you're after the ultimate low latency, you're on the right track. I've been doing the same for many years.
A few things to note though:
1. Your code, with chars only, should work as is, because the alignment requirement for char is 1.
2. With wider types you'd need to pack struct nodeid. In gcc you do it with __attribute__((packed)) . I think MSVC uses #pragma push pack(1)...#pragma pop .
3. Gcc used to have bugs around this area (gaps between bit fields, wrong alignment...) so I strongly suggest using compile-time checks, like STATIC_ASSERT(sizeof(nodeid_t) == sizeof(uint64_t))
4. If some of the 8 bytes are not populated, make sure you put zeros or something in them. Otherwise your comparisons etc would use random values.

It depends on your architecture. But assuming that you're on x86_64 (which is the most likely), you don't need to do the union hack for copying and function arguments (you'd still need it for comparisons).

struct foo {
    char a;
    char b;
    short c;
    int d;
};

void
foo_copy(struct foo *a, struct foo *b)
{
    *a = *b;
}


extern void bar(struct foo a);
void
foo_value(void)
{
    struct foo f = { .a = 1 };
    bar(f);
}
$ cc -fomit-frame-pointer -O2 -S foo.c
$ cat foo.s
[... cleaned up ...]
_foo_copy:                              ## @foo_copy
    movq    (%rsi), %rax
    movq    %rax, (%rdi)
    retq

_foo_value:                             ## @foo_value
    movl    $1, %edi
    jmp _bar                    ## TAILCALL

Different architectures will have different requirements, a strict alignment architecture for example wouldn't be able to do the copy unless the ABI requires larger than usual alignment. Other ABIs might have different calling conventions for structs. So this is hard to answer generally. But if you're on x86_64, you probably either don't need to waste time doing this optimization, or if you want the comparisons to be efficient, this will work like you want.

For something like this, efficiency will likely not be a concern.

That said, this will probably not do what you intend:

if (x.raw > y.raw) {

If you're running on a machine with a little-endian architecture, the least significant byte is stored first. If that's the case, then if for example you have this:

x.nodeid.systemid[0] = 1;
x.nodeid.systemid[1] = 2;
y.nodeid.systemid[0] = 2;
y.nodeid.systemid[1] = 1;

Then (x.raw > y.raw) would evaluate to true.

I am now inclined to define my 3 datastructures simply as typedef uint64_t.

typedef uint64_t isis_simple_item_t;
typedef struct isis_complex_item_t {
  byte_t unused;
  byte_t node_number;
  byte_t systemid[ISIS_SYSTEMID_SIZE];
};

byte_t number;
isis_simple_item_t nodeid;

number = ((isis_complex_item) nodeid).node_number;

This way I can do quick compares, assignement, function returns, function parameters-by-value, etc.

And then when I need to access one of the members inside the struct, which happens a lot less, I'll use wrapper functions. With casts inside them, from uint64_t to the more complex struct. That also means I don't need the union anymore.

而不是所有这些花哨的步法,为什么不简单地使用memcmp ,这适用于任何连续的数据类型可能实现为编译内在因此应该快速并且绝对正确地避开严格的别名规则。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM