简体   繁体   English

malloc一次,然后在结构数组上分配内存

[英]malloc once, then distribute memory over struct arrays

I have a struct that has the following memory layout: 我有一个具有以下内存布局的结构:

uint32_t  
variable length array of type uint16_t
variable length array of type uint16_t

Because of the variable length of the arrays I have pointers to these arrays, effectively: 由于数组的长度可变,因此我有指向这些数组的有效指针:

struct struct1 {
  uint32_t n;
  uint16_t *array1;
  uint16_t *array2;
};
typedef struct struct1 struct1;

Now, when allocation these structs I see two options: 现在,在分配这些结构时,我看到两个选择:

A) malloc the struct itself, then malloc space for the arrays individually and set the pointers in the struct to point to the correct memory location: A)malloc结构体本身,然后为数组分别分配malloc空间,并将结构体中的指针设置为指向正确的内存位置:

uint32_t n1 = 10;
uint32_t n2 = 20;

struct1 *s1 = malloc(sizeof(struct1));
uint16 *array1 = malloc(sizeof(uint16) * n1));
uint16 *array2 = malloc(sizeof(uint16) * n2));
s1->n = n1;
s1->array1 = array1;
s1->array2 = array2;

B) malloc memory for everything combined, then "distribute" the memory over the struct: B)为所有组合使用malloc内存,然后在struct上“分配”内存:

struct1 *s1 = malloc(sizeof(struct1) + (n1 + n2) * sizeof(uint16_t));
s1->n = n1;
s1->array1 = s1 + sizeof(struct1);
s1->array2 = s1 + sizeof(struct1) + n1 * sizeof(uint16_t);

Note that array1 and array2 are not bigger than a few KB and usually not a lot of struct1s are needed. 请注意,array1和array2的大小不超过几个KB,通常不需要很多struct1。 However, cache efficiency is a concern as numeric data crunching is done with this struct. 但是,由于使用此结构完成了数值数据处理,因此需要考虑缓存效率。

  1. Is approach B) possible and if so better (faster) than A in terms of memory locality? 方法B)是否可行?就内存位置而言,是否比A更好(更快)?
  2. I am not very familiar with C, is there a better way of doing B (or A), ie. 我对C不太熟悉,是否有更好的方法来制作B(或A)。 using memcpy or realloc or something? 使用memcpy或realloc或什么?
  3. Anything else to be mindful about in this situation? 在这种情况下还有什么要注意的吗?

Note, that right now I'm using gcc (C89?) on linux but could use C99/C11 if necessary. 请注意,现在我在Linux上使用gcc(C89?),但必要时可以使用C99 / C11。 Thanks in advance. 提前致谢。

EDIT: To clarify further : The size of the arrays will never change after creation. 编辑:进一步澄清 :创建后,数组的大小将永远不会改变。 Multiple struct1s will not be always be allocated at once but rather occasionally during the program's runtime. 多个struct1不会总是一次分配,而是在程序运行时偶尔分配。

I think your option A is much cleaner and would scale in a more sensible way. 我认为您的选择A更清洁,可以更明智地扩展。 Imagine having to realloc space when the array in one of the structures becomes larger: in option A, you can realloc that memory since it isn't logically attached to anything else. 想象一下, realloc空间时,在结构中的一个数组变得更大:在选项A,您可以realloc内存,因为它没有逻辑连接到任何东西。 In option B, you need to add in additional logic to ensure you don't break your other array. 在选项B中,您需要添加其他逻辑以确保不破坏其他阵列。

I also think (even in C89, but I could be wrong) that there is nothing wrong with this: 我还认为(即使在C89中,但我可能是错的)这没有错:

struct1 *s1 = malloc(sizeof(struct1));
s1->array1 = malloc(sizeof(uint16) * n1));
s1->array2 = malloc(sizeof(uint16) * n2));
s1->n = n1;

The above takes out the middle-man arrays. 上面取出了中间人数组。 I think it is cleaner because you immediately see that you are allocating space for a pointer in a structure. 我认为这样比较干净,因为您会立即看到正在为结构中的指针分配空间。

I have used your option B before for 2D arrays, where I just allocate a single space and use logical rules in my code to use it as a 2D space. 我之前将选项B用于2D数组,在这里我只分配一个空间,并在代码中使用逻辑规则将其用作2D空间。 This is useful when I want it to be a rectangular 2D space, so when I increase it, I always increase each row or column. 当我希望它是一个矩形2D空间时,这很有用,因此当我增加它时,我总是增加每一行或每一列。 In other words, I never want to have heterogeneous array sizes. 换句话说,我永远都不想拥有异构数组大小。

Update: 'Arrays will never change in size' 更新:“数组大小永远不会改变”

Because you clarified that your structures/arrays will never need to be reallocated, I think option B is less bad . 因为您已经阐明了您的结构/数组将永远都不需要重新分配,所以我认为选项B 不太糟糕 It still seems to be a worse solution for this application than option A, and here are my reasons for thinking this: 对于该应用程序,它似乎仍然比选项A更糟糕,这是我考虑这一点的原因:

  • malloc is optimized such that there wouldn't be much optimization from allocating a single space compared to allocating the spaces individually. malloc已优化,因此与单独分配空间相比,分配单个空间不会有太多优化。
  • The ability of other engineers to look at and immediately understand your code would be reduced. 其他工程师查看和立即理解您的代码的能力将降低。 To be clear, any competent software engineer should be able to look at option B and figure out what the writer of the code was doing, but it very well could waste that engineers' brain-cycles and could cause a junior engineer to misunderstand the code and create a bug. 要明确的是,任何称职的软件工程师都应该能够查看选项B并弄清楚代码编写者在做什么,但是这样做很可能会浪费工程师的脑力,并可能导致初级工程师误解代码。并创建一个错误。

So, if you comment the code thoroughly, and your application absolutely requires you to optimize everything you possibly can, at the expense of clean and logically sensible code (where memory space and data structures are logically separated in a similar way), and you know that this optimization is better than what a good compiler (like Clang) can do, then option B could be a better option. 因此,如果您对代码进行彻底注释,并且您的应用程序绝对要求您优化所有可能的代码,则以干净且逻辑上合理的代码(内存空间和数据结构以相似的方式在逻辑上分开)为代价,并且您知道这种优化比好的编译器(如Clang)所能做的更好,那么选项B 可能是更好的选择。

Update: Testing 更新:测试

In the spirit of self-criticism I wanted to see if I could evaluate the difference. 本着自我批评的精神,我想看看我是否可以评估这种差异。 So I wrote two programs (one for option A and one for option B) and compiled them with optimizations off. 因此,我编写了两个程序(一个用于选项A,一个用于选项B),并在不进行优化的情况下对其进行了编译。 I used a FreeBSD virtual machine to get as clean of an environment as possible, and I used gcc . 我使用FreeBSD虚拟机来尽可能清洁环境,并使用gcc

Here are the programs that I used to test the two methods: 这是我用来测试这两种方法的程序:

optionA.c: optionA.c:

#include <stdlib.h>
#include <stdio.h>
#include <time.h>

#define NSIZE   100000
#define NTESTS  10000000

struct test_struct {
    int n;
    int *array1;
    int *array2;
};

void freeA(struct test_struct *input) {
    free(input->array1);
    free(input->array2);
    free(input);
    return;
}

void optionA() {
    struct test_struct *s1 = malloc(sizeof(*s1));
    s1->array1 = malloc(sizeof(*(s1->array1)) * NSIZE);
    s1->array2 = malloc(sizeof(*(s1->array1)) * NSIZE);
    s1->n = NSIZE;
    freeA(s1);
    s1 = 0;
    return;
}

int main() {
    clock_t beginA = clock();
    int i;
    for (i=0; i<NTESTS; i++) {
        optionA();
    }
    clock_t endA = clock();
    int time_spent_A = (endA - beginA);
    printf("Time spent for option A: %d\n", time_spent_A);
    return 0;
}

optionB.c: optionB.c:

#include <stdlib.h>
#include <stdio.h>
#include <time.h>

#define NSIZE   100000
#define NTESTS  10000000

struct test_struct {
    int n;
    int *array1;
    int *array2;
};

void freeB(struct test_struct *input) {
    free(input);
    return;
}

void optionB() {
    struct test_struct *s1 = malloc(sizeof(*s1) + 2*NSIZE*sizeof(*(s1->array1)));
    s1->array1 = s1 + sizeof(*s1);
    s1->array2 = s1 + sizeof(*s1) + NSIZE*sizeof(*(s1->array1));
    s1->n = NSIZE;
    freeB(s1);
    s1 = 0;
    return;
}

int main() {
    clock_t beginB = clock();
    int i;
    for (i=0; i<NTESTS; i++) {
        optionB();
    }
    clock_t endB = clock();
    int time_spent_B = (endB - beginB);
    printf("Time spent for option B: %d\n", time_spent_B);
    return 0;
}

Results for these tests are given in clocks (see clock(3) for more information). 这些测试的结果以时钟为单位给出有关更多信息,请参见clock(3) )。

 Series | Option A | Option B
------------------------------
 1      | 332      | 158
------------------------------
 2      | 334      | 155
------------------------------
 3      | 334      | 156
------------------------------
 4      | 333      | 154
------------------------------
 5      | 339      | 156
------------------------------
 6      | 334      | 155
------------------------------
 avg    | 336.0    | 155.7
------------------------------

Note that these speeds are still incredibly fast and translate to milliseconds over millions of tests. 请注意,这些速度仍然非常快,在数百万次测试中转换为毫秒。 I have also found that Clang ( cc ) is better than gcc at optimizing. 我还发现Clang( cc )在优化方面比gcc更好。 On my machine, even after writing a method that writes data to the arrays (to ensure they don't get optimized out of existence) I got no differential between the two methods when compiling with cc . 在我的机器上,即使编写了将数据写入数组的方法(以确保它们不因存在而无法优化),使用cc编译时,这两种方法之间也没有区别。

I would advice a hybrid of the two: 我建议将两者混合使用:

  1. allocate the structs in one call (it is now an array of structs); 在一个调用中分配结构(现在是结构数组);

  2. allocate the arrays in one call, and make sure the size includes any padding for the allignment required by your compiler/platform; 在一次调用中分配数组,并确保大小包括编译器/平台所需的所有填充;

  3. distribute the arrays over the structs, taking the allignment into acount. 将数组分布在结构上,并考虑到分配。

However, malloc is already optimized, so your first solution would still be prefered. 但是, malloc已经进行了优化,因此仍将首选您的第一个解决方案。

Note: as user Greg Schmit's solution points out, allocating all the arrays in one time, will cause difficulties if the array size needs to be changed in run-time 注意:正如用户Greg Schmit的解决方案指出的那样,一次分配所有数组会导致困难,如果需要在运行时更改数组大小

Because the two arrays have the same type, there are much more options than that, based on creative use of the C99 flexible array member. 由于两个数组具有相同的类型,因此基于C99 flexible数组成员的创造性使用,有更多的选择。 I'd recommend you use a pointer only for the second array, 我建议您仅将指针用于第二个数组,

struct foo {
    uint16_t *array2;
    uint32_t  field;
    uint16_t  array1[];
};

and allocate memory for both at the same time: 并同时为两者分配内存:

struct foo *foo_new(const size_t length1, const size_t length2)
{
    struct foo *result;

    result = malloc( sizeof (struct foo)
                   + length1 * sizeof (uint16_t)
                   + length2 * sizeof (uint16_t) );
    if (!result)
        return NULL;

    result->array2 = result->array1 + length1;

    return result;
}

Note that with struct foo *bar , accessing element i in the two arrays uses the same notation, bar->array1[i] and bar->array2[i] , respectively. 请注意,使用struct foo *bar ,访问两个数组中的元素i分别使用相同的符号bar->array1[i]bar->array2[i]


In the context of scientific computing, I would consider completely other options, depending on the access patterns. 在科学计算的背景下,我将根据访问模式完全考虑其他选择。 For example, if the two arrays are accessed in lockstep (in any direction), I would use 例如,如果两个数组以锁步方式(沿任何方向)访问,则我将使用

typedef  uint16_t  pair16[2];

struct bar {
    uint32_t  field;
    pair16    array[];
};

If the arrays were large, then copying them into temporary buffers (arrays of pair16 above, if accessed in lockstep) would possibly help, but with at most a few thousand entries, it is likely not going to give a significant speed boost. 如果数组很大,则将它们复制到临时缓冲区(上面的pair16数组,如果以锁步的方式访问)可能会有所帮助,但最多具有数千个条目,可能不会显着提高速度。

In cases where the access pattern depends on the other, but you still do enough of computation on each entry, it may be useful to compute the address of the next entry early, and use __builtin_prefetch() GCC built-in to tell the CPU you'll need it soon, before doing the computation on the current entry. 如果访问模式彼此依赖,但是您仍然需要对每个条目进行足够的计算,则尽早计算下一个条目的地址并使用内置的__builtin_prefetch() GCC告诉CPU您可能会很有用。在对当前条目进行计算之前,很快就会需要它。 It may reduce the data access latencies (although the access predictors are pretty darn good on current processors already). 这可能会减少数据访问延迟(尽管访问预测器在当前处理器上已经相当不错了)。

With GCC (and to a lesser extent on other common compilers like Intel Compiler Collection, Portland Group, and Pathscale C compilers), I've noticed that code that manipulates pointers (instead of array pointers and array indexing) compiles to better machine code on x86 and x86-64. 使用GCC(并且在较小程度上使用了其他常见的编译器,如Intel Compiler Collection,Portland Group和Pathscale C编译器),我注意到操纵指针的代码(而不是数组指针和数组索引)可以编译为更好的机器代码。 x86和x86-64。 (The reason is actually quite simple: with array pointers and array indexing, you need at least two separate registers, and x86 has relatively few of those. Even x86-64 doesn't have that many of them. GCC in particular is not very strong at optimizing register usage -- it's much better now than in the version 3 era --, so this seems to help a lot in some cases). (原因实际上很简单:使用数组指针和数组索引,您至少需要两个单独的寄存器,而x86相对较少。即使x86-64也没有那么多。特别是GCC并不是很擅长优化寄存器使用情况-现在比版本3时代要好得多-因此在某些情况下似乎有很大帮助)。 For example, if you were to access the first array in a struct foo sequentially, then 例如,如果要顺序访问struct foo的第一个数组,则

void do_something(struct foo *ref)
{
    uint16_t       *array1 = ref->array1;
    uint16_t *const limit1 = ref->array1 + (number of elements in array1);

    for (; array1 < limit1; array1++) {

        /* ... */

    }
}

Approach B is possible, (why don't you just try it?) and it is better, not so much because of memory locality, but because malloc() costs, so the fewer times you call it, the better off you are. 方法B是可行的((为什么不尝试一下呢?)),它更好,并不是因为内存局部性太大,而是因为malloc()花费很大,所以调用它的次数越少,效果就越好。 (Assuming that 'better' means 'faster', which admittedly, is not necessarily the case.) (假定“更好”意味着“更快”,这不一定是事实。)

Memory locality is only marginally improved, since all memory blocks would most likely be continuous (one after the other) in memory, so if you went with approach A your arrays would only be separated by block headers, which are not very big. 内存局部性仅略微提高了,因为所有内存块很可能在内存中是连续的(一个接一个),因此,如果采用方法A,则阵列将仅由块头分开,块头不是很大。 (Of the order of 32 bytes each, maybe a bit larger, but not by much.) The only situation in which your blocks would not be continuous is if you had previously been doing malloc() and free() , so your memory would be fragmented. (每个字节大约32个字节,可能稍大一些,但不是很多。)块不连续的唯一情况是,如果您以前一直在执行malloc()free() ,那么您的内存将支离破碎。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM