简体繁体 English

内存碎片整理/堆压缩 - 在托管语言中很常见，但在 C++ 中则不然。为什么？

[英]Memory defragmentation/heap compaction - commonplace in managed languages, but not in C++. Why?

原文 2020-10-14 08:04:19 6 2 c++/ memory-management

I've been reading up a little on zero-pause garbage collectors for managed languages.我一直在阅读有关托管语言的零暂停垃圾收集器的一些内容。 From what I understand, one of the most difficult things to do without stop-the-world pauses is heap compaction.据我所知，没有 stop-the-world 暂停的最困难的事情之一是堆压缩。 Only very few collectors (eg Azul C4, ZGC) seem to be doing, or at least approaching, this.只有极少数收藏家（例如 Azul C4、ZGC）似乎在这样做，或者至少接近这样做。

So, most GCs introduce dreaded stop-the-world pauses the compact the heap (bad!).因此，大多数 GC 引入了可怕的 stop-the-world 暂停堆的压缩（糟糕！）。 Not doing this seems extremely difficult, and does come with a performance/throughput penalty.不这样做似乎非常困难，并且确实会降低性能/吞吐量。 So either way, this step seems rather problematic.所以无论哪种方式，这一步似乎都有问题。

And yet - as far as I know, most if not all GCs still do compact the heap occasionally.然而 - 据我所知，大多数（如果不是全部）GC 仍然偶尔会压缩堆。 I've yet to see a modern GC that doesn't do this by default.我还没有看到默认情况下不执行此操作的现代 GC。 Which leads me to believe: It has to be really, really important.这让我相信：它必须非常非常重要。 If it wasn't, surely, the tradeoff wouldn't be worth it.如果不是，当然，这种权衡是不值得的。

At the same time, I have never seen anyone do memory defragmentation in C++.同时，我从未见过有人在 C++ 中进行内存碎片整理。 I'm sure some people somewhere do, but - correct me if I am wrong - it does not at all seem to be a common concern.我相信某些地方的某些人会这样做，但是 - 如果我错了，请纠正我 - 这似乎根本不是一个普遍关注的问题。 I could of course imagine static memory somewhat lessens this, but surely, most codebases would do a fair amount of dynamic allocations?!我当然可以想象静态内存会稍微减轻这种情况，但是可以肯定的是，大多数代码库都会进行大量的动态分配？！

So I'm curious, why is that?所以我很好奇，这是为什么？

Are my assumptions (very important in managed languages; rarely done in C++) even correct?我的假设（在托管语言中非常重要；很少在 C++ 中完成）甚至正确吗？ If yes, is there any explanation I'm missing?如果是的话，有什么我想念的解释吗？

2 个解决方案

Garbage collection can compact the heap because it knows where all of the pointers are.垃圾收集可以压缩堆，因为它知道所有指针的位置。 After all, it just finished tracing them.毕竟，它刚刚完成了对它们的追踪。 That means that it can move objects around and adjust the pointers (references) to the new location.这意味着它可以四处移动对象并将指针（引用）调整到新位置。

However, C++ cannot do that, because it doesn't know where all the pointers are.但是，C++ 不能这样做，因为它不知道所有指针在哪里。 If the memory allocation library moved things around, there could be dangling pointers to the old locations.如果内存分配库移动了东西，可能会有指向旧位置的悬空指针。

Oh, and for long running processes, C++ can indeed suffer from memory fragmentation.哦，对于长时间运行的进程，C++ 确实会受到内存碎片的影响。 This was more of a problem on 32-bit systems because it could fail to allocate memory from the OS, because it might have used up all of the available 1 MB memory blocks.这在 32 位系统上更像是一个问题，因为它可能无法从操作系统分配内存，因为它可能已经用完了所有可用的 1 MB 内存块。 In 64-bit it is almost impossible to create so many memory mappings that there is nowhere to put a new one.在 64 位中，几乎不可能创建如此多的内存映射，以至于无处放置新的映射。 However, if you ended up with a 16 byte memory allocation in each 4K memory page, that's a lot of wasted space.但是，如果您最终在每个 4K 内存页面中分配了 16 字节的内存，那将浪费大量空间。

C and C++ applications solve that by using storage pools. C 和 C++ 应用程序通过使用存储池解决了这个问题。 For a web server, for example, it would start a pool with a new request.例如，对于 Web 服务器，它将使用新请求启动一个池。 At the end of that web request, everything in the pool gets destroyed.在该 Web 请求结束时，池中的所有内容都会被销毁。 The pool makes a nice, constant sized block of RAM that gets reused over and over without fragmentation.该池构成了一个不错的、大小恒定的 RAM 块，可以一遍又一遍地重复使用而不会产生碎片。

Garbage collection tends to use recycling pools as well, because it avoids the strain of running a big GC trace and reclaim at the end of a connection.垃圾收集也倾向于使用回收池，因为它避免了在连接结束时运行大型 GC 跟踪和回收的压力。

One method some old operating systems like Apple OS 9 used before virtual memory was a thing is handles.在虚拟内存成为事物之前，一些旧操作系统（如 Apple OS 9）使用的一种方法是句柄。 Instead of a memory pointer, allocation returned a handle.而不是内存指针，分配返回一个句柄。 That handle was a pointer to the real object in memory.该句柄是指向内存中真实对象的指针。 When the operating system needed to compact memory or swap it to disk it would change the handle.当操作系统需要压缩内存或将其交换到磁盘时，它会更改句柄。

I have actually implemented a similar system in C++ using an array of handles into a shared memory map psuedo-database.我实际上已经在 C++ 中实现了一个类似的系统，使用一个句柄数组到一个共享内存映射伪数据库中。 When the map was compacted then the handle table was scanned for affected entries and updated.当映射被压缩时，句柄表被扫描以查找受影响的条目并更新。

Generic memory compaction is not generally useful nor desirable because of its costs.由于其成本，通用内存压缩通常不是有用的，也不是可取的。

What may be desirable is to have no wasted/fragmented memory and that can be achieved by other methods than memory compaction.可能需要的是没有浪费/碎片化的内存，并且可以通过内存压缩以外的其他方法来实现。

In C++ one can come up with a different allocation approach for objects that do cause fragmentation in their specific application, eg double-pointers or double-indexes to allow for object relocation;在 C++ 中，对于在特定应用程序中确实会导致碎片的对象，可以提出一种不同的分配方法，例如双指针或双索引以允许对象重定位； object pools or arenas that prevent or minimize fragmentation.防止或最小化碎片的对象池或竞技场。 Such solutions for specific object types is superior to generic garbage collection because they employ application/business specific knowledge which allows to minimize the scope/cost of object storage maintenance and also happen at most appropriate times.此类针对特定对象类型的解决方案优于通用垃圾收集，因为它们采用了应用程序/业务特定的知识，可以最小化对象存储维护的范围/成本，并且在最合适的时间发生。

A research found that garbage collected languages require 5 times more memory to achieve performance of non-GC equivalent programs.一项研究发现，垃圾收集语言需要 5 倍以上的内存才能实现非 GC 等效程序的性能。 Memory fragmentation is more severe in GC languages.内存碎片在 GC 语言中更为严重。