简体   繁体   English

如何在C ++中实现垃圾收集

[英]How to implement garbage collection in C++

I saw some post about implement GC in C and some people said it's impossible to do it because C is weakly typed. 我在C中看到了一些关于实现GC的帖子,有些人说这是不可能的,因为C是弱类型的。 I want to know how to implement GC in C++. 我想知道如何在C ++中实现GC。

I want some general idea about how to do it. 我想要了解如何做到这一点。 Thank you very much! 非常感谢你!

This is a Bloomberg interview question my friend told me. 这是我朋友告诉我的彭博采访问题。 He did badly at that time. 那时他做得很糟糕。 We want to know your ideas about this. 我们想知道您对此的看法。

Garbage collection in C and C++ are both difficult topics for a few reasons: C和C ++中的垃圾收集都是困难的主题,原因如下:

  1. Pointers can be typecast to integers and vice-versa. 指针可以对整数进行类型转换,反之亦然。 This means that I could have a block of memory that is reachable only by taking an integer, typecasting it to a pointer, then dereferencing it. 这意味着我可以拥有一块只能通过获取整数,将其类型转换为指针,然后取消引用它的内存块。 A garbage collector has to be careful not to think a block is unreachable when indeed it still can be reached. 垃圾收集器必须小心,不要认为块确实无法到达时仍然可以到达。

  2. Pointers are not opaque. 指针不是不透明的。 Many garbage collectors, like stop-and-copy collectors, like to move blocks of memory around or compact them to save space. 许多垃圾收集器,如停止和复制收集器,喜欢移动内存块或压缩它们以节省空间。 Since you can explicitly look at pointer values in C and C++, this can be difficult to implement correctly. 由于您可以在C和C ++中显式查看指针值,因此很难正确实现。 You would have to be sure that if someone was doing something tricky with typecasting to integers that you correctly updated the integer if you moved a block of memory around. 你必须确定,如果有人通过对整数进行类型转换来做一些棘手的操作,那么如果你移动了一块内存,你就可以正确地更新整数。

  3. Memory management can be done explicitly. 内存管理可以明确地完成。 Any garbage collector will need to take into account that the user is able to explicitly free blocks of memory at any time. 任何垃圾收集器都需要考虑用户能够随时显式释放内存块。

  4. In C++, there is a separation between allocation/deallocation and object construction/destruction. 在C ++中,分配/释放与对象构造/销毁之间存在分离。 A block of memory can be allocated with sufficient space to hold an object without any object actually being constructed there. 可以分配具有足够空间的存储器块来保持对象,而不在那里实际构造任何对象。 A good garbage collector would need to know, when it reclaims memory, whether or not to call the destructor for any objects that might be allocated there. 一个好的垃圾收集器在收回内存时需要知道是否为可能在那里分配的任何对象调用析构函数。 This is especially true for the standard library containers, which often make use of std::allocator to use this trick for efficiency reasons. 对于标准库容器尤其如此,由于效率原因,它通常使用std::allocator来使用此技巧。

  5. Memory can be allocated from different areas. 可以从不同区域分配内存。 C and C++ can get memory either from the built-in freestore (malloc/free or new/delete), or from the OS via mmap or other system calls, and, in the case of C++, from get_temporary_buffer or return_temporary_buffer . C和C ++可以从内置的freestore(malloc / free或new / delete)获取内存,也可以通过mmap或其他系统调用从OS获取内存,在C ++的情况下,可以从get_temporary_bufferreturn_temporary_buffer获取内存。 The programs might also get memory from some third-party library. 程序也可能从某些第三方库中获取内存。 A good garbage collector needs to be able to track references to memory in these other pools and (possibly) would have to be responsible for cleaning them up. 一个好的垃圾收集器需要能够跟踪这些其他池中对内存的引用,并且(可能)必须负责清理它们。

  6. Pointers can point into the middle of objects or arrays. 指针可以指向对象或数组的中间。 In many garbage-collected languages like Java, object references always point to the start of the object. 在许多垃圾收集语言(如Java)中,对象引用始终指向对象的开头。 In C and C++ pointers can point into the middle of arrays, and in C++ into the middle of objects (if multiple inheritance is used). 在C和C ++中,指针可以指向数组的中间,而在C ++中指向对象的中间(如果使用多重继承)。 This can greatly complicate the logic for detecting what's still reachable. 这可能会极大地复杂化检测仍然可以访问的逻辑。

So, in short, it's extremely hard to build a garbage collector for C or C++. 因此,简而言之,为C或C ++构建垃圾收集器非常困难。 Most libraries that do garbage collection in C and C++ are extremely conservative in their approach and are technically unsound - they assume that you won't, for example, take a pointer, cast it to an integer, write it to disk, and then load it back in at some later time. 大多数在C和C ++中进行垃圾收集的库在它们的方法中非常保守,并且在技术上不合理 - 例如,他们假设您不会使用指针,将其转换为整数,将其写入磁盘,然后加载它稍晚回来了。 They also assume that any value in memory that's the size of a pointer could possibly be a pointer, and so sometimes refuse to free unreachable memory because there's a nonzero chance that there's a pointer to it. 他们还假设内存中任何指针大小的值都可能是一个指针,因此有时会拒绝释放无法访问的内存,因为有一个非零的机会指向它。

As others have pointed out, the Boehm GC does do garbage collection for C and C++, but subject to the aforementioned restrictions. 正如其他人所指出的那样, Boehm GC确实为C和C ++做了垃圾收集,但受到上述限制。

Interestingly, C++11 includes some new library functions that allow the programmer to mark regions of memory as reachable and unreachable in anticipation of future garbage collection efforts. 有趣的是,C ++ 11包含一些新的库函数,允许程序员将内存区域标记为可达和无法访问,以预期将来的垃圾收集工作。 It may be possible in the future to build a really good C++11 garbage collector with this sort of information. 将来有可能用这种信息构建一个非常好的C ++ 11垃圾收集器。 In the meantime though, you'll need to be extremely careful not to break any of the above rules. 在此期间,您需要非常小心,不要破坏上述任何规则。

C isn't C++, but both have the same "weakly typed" issues. C不是C ++,但两者都有相同的“弱类型”问题。 It's not the implicit typecasts that cause an issue, though, but the tendency towards "punning" (subverting the type system), especially in data structure libraries. 然而,不是导致问题的隐式类型转换,而是“惩罚”(颠覆类型系统)的趋势,特别是在数据结构库中。

There are garbage collectors out there for C and/or C++. 垃圾收集器那里为C和/或C ++。 The Boehm conservative collector is probably the best know. Boehm保守的收藏家可能是最了解的。 It's conservative in that, if it sees a bit pattern that looks like a pointer to some object, it doesn't collect that object. 它是保守的,如果它看到一个看起来像某个对象的指针的位模式,它就不会收集该对象。 That value might be some other type of value completely, so the object could be collected, but "conservative" means playing safe. 该值可能完全是其他类型的值,因此可以收集对象,但“保守”意味着安全。

Even a conservative collector can be fooled, though, if you use calculated pointers. 如果你使用计算指针,即使是保守的收藏家也会被愚弄。 There's a data structure, for example, where every list node has a field giving the difference between the next-node and previous-node addresses. 例如,有一个数据结构,其中每个列表节点都有一个字段,给出下一个节点和前一个节点地址之间的差异。 The idea is to give double-linked list behaviour with a single link per node, at the expense of more complex iterators. 这个想法是给每个节点一个链接提供双链表行为,代价是更复杂的迭代器。 Since there's no explicit pointer anywhere to most of the nodes, they may be wrongly collected. 由于大多数节点的任何地方都没有明确的指针,因此可能会错误地收集它们。

Of course this is a very exceptional special case. 当然这是一个非常特殊的特例。

More important - you can either have reliable destructors or garbage collection, not both. 更重要的是 - 您可以拥有可靠的析构函数或垃圾收集,而不是两者兼而有之。 When a garbage cycle is collected, the collector cannot decide which destructor to call first. 收集垃圾循环时,收集器无法决定首先调用哪个析构函数。

Since the RAII pattern is pervasive in C++, and that relies on destructors, there is IMO a conflict. 由于RAII模式在C ++中普遍存在,并且依赖于析构函数,因此IMO存在冲突。 There may be valid exceptions, but my view is that if you want garbage collection, you should use a language that's designed from the ground up for garbage collection (Java, C#, ...). 可能存在有效的异常,但我的观点是,如果您想要垃圾收集,您应该使用从头开始设计用于垃圾收集的语言(Java,C#,...)。

Look into the Boehm Garbage Collector . 看看Boehm垃圾收集器

You could either use smart pointers or create your own container object which will track references and handle memory allocation etc. Smart pointers would probably be preferable. 您可以使用智能指针或创建自己的容器对象,它将跟踪引用并处理内存分配等。智能指针可能更可取。 Often times you can avoid dynamic heap allocation altogether. 通常,您可以完全避免动态堆分配。

For example: 例如:

char* pCharArray = new char[128];
// do some stuff with characters
delete [] pCharArray;

The danger with the above being if anything throws between the new and the delete your delete will not be executed. 如果在新删除和删除之间抛出任何内容,则上述存在的危险将不会被执行。 Something like above could easily be replaced with safer "garbage collected" code: 像上面这样的东西很容易被更安全的“垃圾收集”代码所取代:

std::vector<char> charArray;
// do some stuff with characters

Bloomberg has notoriously irrelevant interview questions from a practical coding standpoint. 从实际的编码角度来看,彭博与众所周知的无关紧要的面试问题。 Like most interviewers they are primarily concerned with how you think and your communication skills than the actual solution though. 像大多数面试官一样,他们主要关心的是你的思考方式和沟通技巧,而不是实际的解决方案。

The claim you saw is false; 你看到的说法是假的; the Boehm collector supports C and C++. Boehm收集器支持C和C ++。 I suggest reading the Boehm collector's documentation (particularly this page )for a good overview of how one might write a garbage collector in C or C++. 我建议阅读Boehm收集器的文档(特别是本页 ),以便很好地概述如何用C或C ++编写垃圾收集器。

You can read about the shared_ptr struct. 您可以阅读有关shared_ptr结构的信息。

It implements a simple reference-counting garbage collector. 它实现了一个简单的引用计数垃圾收集器。

If you want a real garbage collector, you can overload the new operator. 如果你想要一个真正的垃圾收集器,你可以重载新的操作符。

Create a struct similar to shared_ptr, call it Object. 创建一个类似于shared_ptr的结构,将其称为Object。

This will wrap the new object created. 这将包装创建的新对象。 Now with overloading its operators, you can control the GC. 现在,通过重载其运算符,您可以控制GC。

All you need to do now, is just implement one of the many GC algorithms 您现在需要做的就是实现许多GC算法中的一种

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM