简体   繁体   English

Boehm GC如何为C程序工作?

[英]How does Boehm GC work for C program?

I checked Boehm GC. 我检查了Boehm GC。 The GC for C/C++. 用于C / C ++的GC。

I know mark-and-sweep algorithm. 我知道标记和扫描算法。 What I'm in curious is how it picks up only pointers in whole C memory. 我很好奇的是它如何只在整个C内存中获取指针。 My understanding about C memory is just a plain byte array. 我对C内存的理解只是一个普通的字节数组。 Is it possible to determine a value in memory is pointer or not? 是否有可能确定内存中的值是否为指针?

The Boehm GC is a conservative collector, which means it assumes everything is a pointer. Boehm GC是一个保守的收集器,这意味着它假设一切都是指针。 This means that it can find false positive references, like an integer which coincidentally has the value of an address in the heap. 这意味着它可以找到误报引用,就像一个巧合地具有堆中地址值的整数。 As a result, some blocks may stay in memory longer than they would with a non-conservative collector. 结果,一些块可能比非保守的收集器在内存中停留的时间更长。

Here's a description from Boehm's page : 以下是Boehm页面的描述:

The garbage collector uses a modified mark-sweep algorithm. 垃圾收集器使用修改的标记扫描算法。 Conceptually it operates roughly in four phases, which are performed occasionally as part of a memory allocation: 从概念上讲,它大致分四个阶段运行,偶尔作为内存分配的一部分执行:

  1. Preparation Each object has an associated mark bit. 准备每个对象都有一个关联的标记位。 Clear all mark bits, indicating that all objects are potentially unreachable. 清除所有标记位,表示所有对象都可能无法访问。
  2. Mark phase Marks all objects that can be reachable via chains of pointers from variables. 标记阶段标记可通过变量指针链到达的所有对象。 Often the collector has no real information about the location of pointer variables in the heap, so it views all static data areas, stacks and registers as potentially containing pointers. 收集器通常没有关于堆中指针变量位置的真实信息,因此它将所有静态数据区域,堆栈和寄存器视为可能包含指针。 Any bit patterns that represent addresses inside heap objects managed by the collector are viewed as pointers. 表示收集器管理的堆对象内的地址的任何位模式都被视为指针。 Unless the client program has made heap object layout information available to the collector, any heap objects found to be reachable from variables are again scanned similarly. 除非客户端程序已将堆对象布局信息提供给收集器,否则将再次以类似方式扫描发现可从变量访问的任何堆对象。
  3. Sweep phase Scans the heap for inaccessible, and hence unmarked, objects, and returns them to an appropriate free list for reuse. 扫描阶段扫描堆中的不可访问的,因此未标记的对象,并将它们返回到适当的空闲列表以供重用。 This is not really a separate phase; 这不是一个单独的阶段; even in non incremental mode this is operation is usually performed on demand during an allocation that discovers an empty free list. 即使在非增量模式下,这也是通常在发现空闲列表的分配期间按需执行操作。 Thus the sweep phase is very unlikely to touch a page that would not have been touched shortly thereafter anyway. 因此,扫描阶段不太可能触及此后不久就不会被触摸的页面。
  4. Finalization phase Unreachable objects which had been registered for finalization are enqueued for finalization outside the collector. 结束阶段已注册完成的无法访问的对象将排队,以便在收集器外部完成。

You should also know that the Boehm GC needs to be given a set of "roots", which are starting points for the mark-and-sweep algorithm. 您还应该知道Boehm GC需要给出一组“根”,它们是标记和扫描算法的起点。 The stack and registers are automatically roots. 堆栈和寄存器是自动根。 You need to explicitly add global pointers as roots. 您需要显式添加全局指针作为根。


EDIT: In comments, some concerns were pointed out about conservative collectors in general. 编辑:在评论中,一般关注保守收藏家的一些担忧。 It is true that integers that look like heap pointers to the collector can cause memory not to be released. 确实,看起来像收集器的堆指针的整数会导致内存不被释放。 This is not as big of a problem as you might think. 这并不像你想象的那么大。 Most scalar integers in a program are used for counts and sizes and are fairly small (so they would not look like heap pointers). 程序中的大多数标量整数用于计数和大小,并且相当小(因此它们看起来不像堆指针)。 You would mostly run into problems with arrays containing bitmaps, strings, floating point data, or anything of that sort. 您将主要遇到包含位图,字符串,浮点数据或任何类型的数组的问题。 Boehm GC lets you allocate a block with GC_MALLOC_ATOMIC which indicates to the collector that the block will not contain any pointers. Boehm GC允许您使用GC_MALLOC_ATOMIC分配一个块,该块向收集器指示该块不包含任何指针。 If you look in gc_typed.h , you will also find ways to specify what parts of a block may contain pointers. 如果查看gc_typed.h ,您还可以找到指定块的哪些部分可能包含指针的方法。

That said, a fundamental limitation of a conservative collector is that it cannot safely move memory around during collection since pointer rewriting is not safe. 也就是说,保守收集器的一个基本限制是它在收集过程中不能安全地移动内存,因为指针重写是不安全的。 This means you won't get any of the benefits of compaction like lowered fragmentation and improved cache performance. 这意味着您将无法获得压缩的任何好处,例如降低碎片和提高缓存性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM