简体   繁体   English

长度为8的结构的效率,以及uint64_t

[英]Efficiency of structs of length 8, and uint64_t

TLDR: are 8-byte structures handled just as efficiently as 8-byte uint64_t? TLDR:8字节结构的处理效果与8字节uint64_t一样高效吗?

I have 3 data structures that are very similar. 我有3个非常相似的数据结构。 They are 6, 7 and 8 bytes long. 它们长6,7和8个字节。 I want to put them all in uint64_t variables. 我想把它们全部放在uint64_t变量中。 The goal is that comparisons and assignments will be very efficient. 目标是比较和分配将非常有效。 (These values are used as key in several (large) trees). (这些值在几个(大)树中用作键)。

Example: I have defined the following data-structure, for the one that is 7 bytes long. 示例:我为7字节长的数据结构定义了以下数据结构。

typedef struct {
  union {
    uint64_t raw;
    struct {
      uint8_t unused;
      uint8_t node_number;
      uint8_t systemid[SYSTEMID_LENGTH]; /* SYSTEMID_LENGTH is 6 bytes. */
    } nodeid;
  };
} nodeid_t;

I can now do quick assignments and copies via the raw union member. 我现在可以通过原始联盟成员快速分配和复制。

nodeid_t x, y;

x.raw = y.raw
if (x.raw > y.raw) {

Etc, etc. 等等

My question is about use in functions and in assignments. 我的问题是关于在函数和赋值中的使用。 When I pass this struct by value, will the compiler (gcc) recognize that these structures are 8 bytes long. 当我按值传递此结构时,编译器(gcc)是否会识别出这些结构长度为8个字节。 And therefore treat as if they are int64_t? 因此,好像他们是int64_t?

Example: Will there be efficiency/performance differences between: 示例:以下两者之间是否存在效率/性能差异:

int64_t my_function();
nodeid_t my_function();

In other words, will gcc use 1 instruction to put a nodeid_t on the stack, as if it was an integer? 换句话说,gcc会使用1条指令将nodeid_t放在堆栈上,好像它是一个整数吗? Or will it create a loop, and copy 8 bytes one by one? 或者它会创建一个循环,并逐个复制8个字节? Does this depend on -O optimization? 这取决于-O优化吗?

Same question for assignment. 同样的转让问题。

int64_t a, b;
nodeid_t x, y;

a = b; /* One machine instruction, I hope. */
x = y; /* Also one instruction, or will it do a loop ? */

You cannot be certain that the union has the same size as uint64_t . 您无法确定union的大小与uint64_t相同。

This is due to packing in the nodeid struct : compilers will often insert gaps between struct members. 这是由于在nodeid struct 打包 :编译器通常会在struct成员之间插入空白。 Some compilers allow you to change the packing arrangements but then your code will not be portable. 有些编译器允许您更改打包安排,但您的代码将无法移植。

It would be safer to have an array of uint8_t : then the memory would be contiguous. 拥有一个uint8_t 数组会更安全:那么内存将是连续的。

A compiler will then simply copy the memory on assignment, so you may as well use nodeid_t as your function return types. 然后编译器将简单地复制内存,因此您也可以使用nodeid_t作为函数返回类型。

Your second job is to rename nodeid_t : _t suffixes are reserved in POSIX C. 你的第二个工作是重命名nodeid_t_t后缀在POSIX C中保留。

Portability aside, if you're after the ultimate low latency, you're on the right track. 除了便携性之外,如果您追求极致的低延迟,那么您就走在了正确的轨道上。 I've been doing the same for many years. 多年来我一直这样做。
A few things to note though: 有几点需要注意:
1. Your code, with chars only, should work as is, because the alignment requirement for char is 1. 1.您的代码(仅限字符)应该按原样运行,因为char的对齐要求是1。
2. With wider types you'd need to pack struct nodeid. 2.使用更宽的类型,您需要打包struct nodeid。 In gcc you do it with __attribute__((packed)) . 在gcc中,你使用__attribute__((packed)) I think MSVC uses #pragma push pack(1)...#pragma pop . 我认为MSVC使用#pragma push pack(1)...#pragma pop
3. Gcc used to have bugs around this area (gaps between bit fields, wrong alignment...) so I strongly suggest using compile-time checks, like STATIC_ASSERT(sizeof(nodeid_t) == sizeof(uint64_t)) 3. Gcc曾经在这个区域有bug(比特字段之间的间隙,错误的对齐......)所以我强烈建议使用编译时检查,比如STATIC_ASSERT(sizeof(nodeid_t) == sizeof(uint64_t))
4. If some of the 8 bytes are not populated, make sure you put zeros or something in them. 4.如果未填充8个字节中的某些字节,请确保在其中放置零或其他内容。 Otherwise your comparisons etc would use random values. 否则,您的比较等将使用随机值。

It depends on your architecture. 这取决于您的架构。 But assuming that you're on x86_64 (which is the most likely), you don't need to do the union hack for copying and function arguments (you'd still need it for comparisons). 但假设您使用的是x86_64(最有可能),您不需要为复制和函数参数执行联合黑客攻击(您仍然需要它进行比较)。

struct foo {
    char a;
    char b;
    short c;
    int d;
};

void
foo_copy(struct foo *a, struct foo *b)
{
    *a = *b;
}


extern void bar(struct foo a);
void
foo_value(void)
{
    struct foo f = { .a = 1 };
    bar(f);
}
$ cc -fomit-frame-pointer -O2 -S foo.c
$ cat foo.s
[... cleaned up ...]
_foo_copy:                              ## @foo_copy
    movq    (%rsi), %rax
    movq    %rax, (%rdi)
    retq

_foo_value:                             ## @foo_value
    movl    $1, %edi
    jmp _bar                    ## TAILCALL

Different architectures will have different requirements, a strict alignment architecture for example wouldn't be able to do the copy unless the ABI requires larger than usual alignment. 不同的体系结构将有不同的要求,例如严格的对齐体系结构将无法进行复制,除非ABI需要比通常更大的对齐。 Other ABIs might have different calling conventions for structs. 其他ABI可能对结构有不同的调用约定。 So this is hard to answer generally. 所以这一般很难回答。 But if you're on x86_64, you probably either don't need to waste time doing this optimization, or if you want the comparisons to be efficient, this will work like you want. 但是如果您使用的是x86_64,那么您可能不需要浪费时间进行此优化,或者如果您希望比较有效,则可以按照您的需要进行操作。

For something like this, efficiency will likely not be a concern. 对于这样的事情,效率可能不会成为一个问题。

That said, this will probably not do what you intend: 也就是说,这可能不会达到您的意图:

if (x.raw > y.raw) {

If you're running on a machine with a little-endian architecture, the least significant byte is stored first. 如果您在具有little-endian体系结构的计算机上运行,​​则首先存储最低有效字节。 If that's the case, then if for example you have this: 如果是这种情况,那么如果你有这个:

x.nodeid.systemid[0] = 1;
x.nodeid.systemid[1] = 2;
y.nodeid.systemid[0] = 2;
y.nodeid.systemid[1] = 1;

Then (x.raw > y.raw) would evaluate to true. 然后(x.raw > y.raw)将评估为true。

I am now inclined to define my 3 datastructures simply as typedef uint64_t. 我现在倾向于将我的3个数据结构简单地定义为typedef uint64_t。

typedef uint64_t isis_simple_item_t;
typedef struct isis_complex_item_t {
  byte_t unused;
  byte_t node_number;
  byte_t systemid[ISIS_SYSTEMID_SIZE];
};

byte_t number;
isis_simple_item_t nodeid;

number = ((isis_complex_item) nodeid).node_number;

This way I can do quick compares, assignement, function returns, function parameters-by-value, etc. 这样我就可以做快速比较,分配,函数返回,函数参数值等。

And then when I need to access one of the members inside the struct, which happens a lot less, I'll use wrapper functions. 然后,当我需要访问结构中的一个成员时,发生的情况要少得多,我将使用包装器函数。 With casts inside them, from uint64_t to the more complex struct. 在其中使用强制转换,从uint64_t到更复杂的结构。 That also means I don't need the union anymore. 这也意味着我不再需要工会了。

而不是所有这些花哨的步法,为什么不简单地使用memcmp ,这适用于任何连续的数据类型可能实现为编译内在因此应该快速并且绝对正确地避开严格的别名规则。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM