简体   繁体   English

malloc如何请求堆内存

[英]How malloc ask for heap memory

I have a huge array which I have allotted to heap, since it would lead to error if left on the stack. 我有一个巨大的数组分配给堆,因为如果放在堆栈上会导致错误。 Now there are two methods to my knowledge which I could send this to the heap. 据我所知,现在有两种方法可以将其发送到堆中。

#1 #1

int i;
int x=10000, int y=10000;
double** array=(double**)malloc(sizeof(double*)*x);
if (image) {
    for (i=0; i<x; i++) {
        array[i] =(double*)malloc(sizeof(double)*y);
    }
}

#2 #2

double *array[x][y]=(double*)malloc(sizeof(double)*x*y);

Now I was wondering which method is superoir? 现在我想知道哪种方法是超级方法? I think #1 is asking for x blocks of legnth y in heap, which need not be next to each other. 我认为#1要求在堆中获取x个长度为y的块,它们不必彼此相邻。 Where as #2 is asking for a block of y*x in heap. 其中#2要求堆中的y * x块。 #2 is asking for a huge block of x*y, where as #1 is asking for blocks which don't need to be connected. #2要求一个x * y的巨大块,而#1则要求不需要连接的块。 So would #1 be superoir since it could be split up. #1会成为超级英雄,因为它可能会分裂。 Say the heap couldn't handle getting a huge stripe with length x*y could handle x amount of y stripes of data. 假设堆无法处理获得长度为x * y的巨大条带,则可以处理x数量的y条数据带。

First is this even true? 首先,这是真的吗? Am I missing something about either method? 我是否缺少任何一种方法? Is my argument even practical, or if true, is not a likely scenario? 我的论点是否可行,或者如果是,那是不可能的情况吗? Got an even more superoir method? 有一种更高级的方法?

Thanks for you insight. 感谢您的见识。

You are correct that the first method may be more flexible because it need not find a contiguous span of free memory of the total size, whereas the second one does. 您是正确的,第一种方法可能更灵活,因为它不需要找到总大小的连续空闲内存,而第二种方法可以。 A possible adverse affect of this is that this itself may cause even more fragmentation of the heap, if the slabs allocated aren't contiguous. 这样做的可能的不利影响是,如果分配的平板不是连续的,则它本身可能导致堆的更多碎片。 There'll be regions of space in between each slab within which future allocations will need to find room. 每个楼板之间将有空间区域,将来需要在其中分配空间。

The second option, however, may exploit spatial and temporal locality . 然而,第二种选择可以利用空间和时间上的局部性 Basically, since more of the data is right next to each other, there's an increased chance that the data you need will be in the CPU caches, and as a result, operating on this memory will be a lot faster. 基本上,由于更多数据彼此相邻,因此您需要的数据将更多地存在于CPU缓存中,因此,在此内存上进行操作的速度将大大提高。

It dependes on the memory allocator you use and the value of your x and y. 它取决于您使用的内存分配器以及x和y的值。

Memory allocator often cache small memory blocks in user space and handle small allocation in user space, while forwording larger allocation request to kernel via mmap . 内存分配器通常在用户空间中缓存较小的内存块,并在用户空间中处理较小的分配,同时通过mmap较大的分配请求转发给内核。

Most memory allocators work like this: 大多数内存分配器的工作方式如下:

void* malloc(size_t size)
    if (size > THRESHOLD) {
        return large_alloc(size)     // forward to mmap
    }
retry:
    void* ret = small_alloc(size);   // handled in user space
    if (ret == NULL) {               // no small blocks left
        enlarge_heap();              // map more memory from kernel
        goto retry;
    }
    return ret;
}

In your case y == 10000 so you are asking for a 80000-byte memory block. 在您的情况下y == 10000,因此您需要一个80000字节的存储块。 In the default memory allocator in glibc, the mmap threshold is 128kB. 在glibc中的默认内存分配器中,mmap阈值为128kB。 So this request tends to be handled in user space if the allocator already cached enough memory. 因此,如果分配器已经缓存了足够的内存,则该请求通常在用户空间中处理。 But #2 would invoke an mmap call since it's larger than 128kB. 但是#2会调用mmap调用,因为它大于128kB。

But, in your example x == 10000. So you are talking about a single mmap system call call and 10000 allocations in user space. 但是,在您的示例中x ==10000。因此,您正在谈论单个mmap系统调用和用户空间中的10000个分配。 Trust me. 相信我。 #2 is much more faster: #2更快:

An allocation in a highly optimized allocator implementation always takes more than 70 cycles on a modern x86 machine. 在现代x86机器上,高度优化的分配器实现中的分配始终花费70个以上的周期。 10000 allocations would consume more than 700000 cycles. 10000个分配将消耗超过700000个周期。 But an typical mmap call latency should be no more than 100000 cycles. 但是典型的mmap呼叫等待时间不应超过100000个周期。 So #2 is better. 所以#2更好。

For other allocators such as TCMalloc it's a little bit different. 对于其他分配器,例如TCMalloc,则有所不同。 TCMalloc has no such threshold and always tries to handle large allocation request in user space in its Span structure. TCMalloc没有这样的阈值,并且始终尝试在其Span结构中处理用户空间中的大型分配请求。 So #2 is definatly much better since it only need a single allocation. 所以#2肯定好得多,因为它只需要一个分配。


I agree that #1 is more flexible since #2 requires the allocator to find a large contiguous memory block. 我同意#1更灵活,因为#2要求分配器查找较大的连续内存块。 But remember it is only contiguous in virtual memory, and the physical pages are mapped on-demand when you first touches it. 但是请记住,它仅在虚拟内存中是连续的,物理页面在您初次触摸时即按需映射。 It means it need not to be contiguous in physical memory. 这意味着它不必在物理内存中是连续的。 And it is often easy to find a 8 * 10000 * 10000 byte of contiguous memory area in virtual memory. 而且通常很容易在虚拟内存中找到一个8 * 10000 * 10000字节的连续内存区域。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM