简体   繁体   English

为何在RAII可用时进行垃圾收集?

[英]Why garbage collection when RAII is available?

I hear talks of C++14 introducing a garbage collector in the C++ standard library itself. 我听说C ++ 14在C ++标准库中引入垃圾收集器。 What is the rationale behind this feature? 这个功能背后的理由是什么? Isn't this the reason that RAII exists in C++? 这不是RAII存在于C ++中的原因吗?

  • How will the presence of standard library garbage collector affect the RAII semantic? 标准库垃圾收集器的存在将如何影响RAII语义?
  • How does it matter to me(the programmer) or the way in which I write C++ programs? 对我(程序员)或我编写C ++程序的方式有什么影响?

Garbage collection and RAII are useful in different contexts. 垃圾收集和RAII在不同的环境中很有用。 The presence of GC should not affect your use of RAII. GC的存在不应影响您对RAII的使用。 Since RAII is well-known, I give two examples where GC is handy. 由于RAII是众所周知的,我举两个GC很方便的例子。


Garbage collection would be a great help in implementing lock-free data structures. 垃圾收集对于实现无锁数据结构非常有帮助。

[...] it turns out that deterministic memory freeing is quite a fundamental problem in lock-free data structures. [...]事实证明,确定性内存释放是无锁数据结构中的一个基本问题。 (from Lock-Free Data Structures By Andrei Alexandrescu) (来自无锁数据结构作者:Andrei Alexandrescu)

Basically the problem is that you have to make sure you are not deallocating the memory while a thread is reading it. 基本上问题是你必须确保在线程读取时不释放内存。 That's where GC becomes handy: It can look at the threads and only do the deallocation when it is safe. 这就是GC变得方便的地方:它可以查看线程,只在安全时才进行解除分配。 Please read the article for details. 请阅读文章了解详情。

Just to be clear here: it doesn't mean that the WHOLE WORLD should be garbage collected as in Java; 这里要明确一点:这并不意味着整个世界应该像Java一样被垃圾收集; only the relevant data should be garbage collected accurately. 只有相关数据才能准确地进行垃圾收集。


In one of his presentations, Bjarne Stroustrup also gave a good, valid example where GC becomes handy. 在他的一个演讲中, Bjarne Stroustrup也提供了一个很好的,有效的例子,GC变得很方便。 Imagine an application written in C/C++, 10M SLOC in size. 想象一下用C / C ++编写的应用程序,大小为10M SLOC。 The application works reasonably well (fairly bug free) but it leaks. 该应用程序工作得相当好(相当无bug),但它泄漏。 You neither have the resources (man hours) nor the functional knowledge to fix this. 你既没有资源(工时)也没有功能知识来解决这个问题。 The source code is a somewhat messy legacy code. 源代码是一个有点混乱的遗留代码。 What do you do? 你是做什么? I agree that it is perhaps the easiest and cheapest way to sweep the problem under the rug with GC. 我同意这可能是用GC解决问题的最简单,最便宜的方法。


As it has been pointed out by sasha.sochka , the garbage collector will be optional . 正如sasha.sochka所指出的那样垃圾收集器是可选的

My personal concern is that people would start using GC like it is used in Java and would write sloppy code and garbage collect everything. 我个人担心的是人们会开始使用GC,就像在Java中使用它一样,会编写草率的代码并且垃圾收集所有内容。 (I have the impression that shared_ptr has already become the default 'go to' even in cases where unique_ptr or, hell, stack allocation would do it.) (我的印象是,即使在unique_ptr或者地狱堆栈分配的情况下, shared_ptr已经成为默认的“转到”。)

I agree with @DeadMG that there is no GC in current C++ standard but I would like to add the following citation from B. Stroustrup: 我同意@DeadMG当前C ++标准中没有GC,但我想在B. Stroustrup中添加以下引文:

When (not if) automatic garbage collection becomes part of C++, it will be optional 当(不是)自动垃圾收集成为C ++的一部分时,它将是可选的

So Bjarne is sure that it will be added in future. 所以Bjarne确信它将来会被添加。 At least the chairman of the EWG (Evolution Working Group) and one of the most important committee members (and more importantly language creator) wants to add it. 至少EWG(进化工作组)的主席和最重要的委员会成员之一(更重要的是语言创建者)想要添加它。

Unless he changed his opinion we can expect it to be added and implemented in the future. 除非他改变了他的观点,否则我们可以预期它将来会被添加和实施。

There are some algorithms which are complicated/inefficient/impossible to write without a GC. 有些算法在没有GC的情况下编写复杂/低效/不可能。 I suspect this is the major selling point for GC in C++, and can't ever see it being used as a general-purpose allocator. 我怀疑这是GC在C ++中的主要卖点,并且永远不会看到它被用作通用分配器。

Why not a general-purpose allocator? 为什么不是通用分配器?

First, We have RAII, and most (including me) seem to believe that this is a superior method of resource management. 首先,我们有RAII,而且大多数人(包括我)似乎都认为这是一种优越的资源管理方法。 We like determinism because it makes writing robust, leak-free code a lot simpler and makes performance predictable. 我们喜欢确定性,因为它使编写健壮,无泄漏的代码变得更加简单,并使性能可预测。

Second, you'll need to place some very un-C++-like restrictions on how you can use memory. 其次,您需要对如何使用内存进行一些非C ++类似的限制。 For instance, you'd need at least one reachable, un-obfuscated pointer. 例如,您至少需要一个可到达的,未混淆的指针。 Obfuscated pointers, as are popular in common tree container libraries (using alignment-guaranteed low bits for color flags) among others, won't be recognizable by the GC. 在常见的树容器库中使用的混淆指针(使用对齐保证的低位用于颜色标记)等,GC将无法识别。

Related to that, the things which make modern GCs so usable are going to be very difficult to apply to C++ if you support any number of obfuscated pointers. 与此相关的是,如果您支持任意数量的混淆指针,那么使现代GC如此可用的东西将很难应用于C ++。 Generational defragmenting GCs are really cool, because allocating is extremely cheap (essentially just incrementing a pointer) and eventually your allocations get compacted into something smaller with improved locality. 分代碎片整理GC非常酷,因为分配非常便宜(基本上只是增加一个指针),最终你的分配会因为局部性的改进而被压缩成更小的东西。 To do this, objects need to be movable. 要做到这一点,物体需要是可移动的。

To make an object safely movable, the GC needs to be able to update all the pointers to it. 为了使对象安全地移动,GC需要能够更新它的所有指针。 It won't be able to find obfuscated ones. 它将无法找到混淆的。 This could be accomodated, but wouldn't be pretty (probably a gc_pin type or similar, used like current std::lock_guard , which is used whenever you need a raw pointer). 这可以容纳,但不会很漂亮(可能是gc_pin类型或类似的,像当前std::lock_guard使用,无论何时需要原始指针时都会使用它)。 Usability would be out the door. 可用性将会出现。

Without making things movable, a GC would be significantly slower and less scalable than what you're used to elsewhere. 如果不使用移动设备,GC将比您在其他地方习惯的速度慢得多且可扩展性也差。

Usability reasons (resource management) and efficiency reasons (fast, movable allocations) out of the way, what else is GC good for? 可用性原因(资源管理)和效率原因(快速,可移动的分配)不受影响,GC还有什么用呢? Certainly not general-purpose. 当然不是通用的。 Enter lock-free algorithms. 输入无锁算法。

Why lock-free? 为什么无锁?

Lock-free algorithms work by letting an operation under contention go temporarily "out of sync" with the data structure and detecting/correcting this at a later step. 无锁算法通过让争用操作暂时与数据结构“不同步”并在稍后的步骤中检测/纠正它来工作。 One effect of this is that under contention memory might be accessed after it has been deleted. 这样做的一个结果是,在争用内存可能会在删除后被访问。 For example, if you have multiple threads competing to pop a node from a LIFO, it is possible for one thread to pop and delete the node before another thread has realized the node was already taken: 例如,如果有多个线程竞争从LIFO弹出一个节点,则一个线程可能会弹出并删除该节点,然后另一个线程已意识到该节点已被占用:

Thread A: 线程A:

  • Get pointer to root node. 获取指向根节点的指针。
  • Get pointer to next node from root node. 从根节点获取指向下一个节点的指针。
  • Suspend 暂停

Thread B: 线程B:

  • Get pointer to root node. 获取指向根节点的指针。
  • Suspend 暂停

Thread A: 线程A:

  • Pop node. 流行节点。 (replace root node pointer with next node pointer, if root node pointer hasn't changed since it was read.) (如果根节点指针在读取后没有更改,则将根节点指针替换为下一个节点指针。)
  • Delete node. 删除节点。
  • Suspend 暂停

Thread B: 线程B:

  • Get pointer to next node from our pointer of root node, which is now "out of sync" and was just deleted so instead we crash. 从我们的根节点指针获取指向下一个节点的指针,该指针现在“不同步”并且刚刚被删除,因此我们崩溃了。

With GC you can avoid the possibility of reading from uncommitted memory because the node would never be deleted while Thread B is referencing it. 使用GC,您可以避免从未提交的内存中读取数据,因为在线程B引用它时,永远不会删除该节点。 There are ways around this, such as hazard pointers or catching SEH exceptions on Windows, but these can hurt performance significantly. 有很多方法可以解决这个问题,例如危险指针或在Windows上捕获SEH异常,但这些可能会严重影响性能。 GC tends to be the most optimal solution here. GC往往是最优化的解决方案。

There isn't, because there isn't one. 没有,因为没有一个。 The only features C++ ever had for GC were introduced in C++11 and they're just marking memory, there's no collector required. 在C ++ 11中引入了C ++曾用于GC的唯一功能,它们只是标记内存,不需要收集器。 Nor will there be in C++14. 在C ++ 14中也不存在。

There is no way in hell a collector could pass Committee, is my opinion. 我认为,收藏家无法通过委员会。

GC has the following advantages: GC具有以下优点:

  1. It can handle circular references without programmer assistance (with RAII-style, you have to use weak_ptr to break circles). 它可以在没有程序员帮助的情况下处理循环引用(使用RAII风格,你必须使用weak_ptr来打破圆圈)。 So a RAII style application can still "leak" if it is used improperly. 因此,如果使用不当,RAII样式的应用程序仍然可以“泄漏”。
  2. Creating/destroying tons of shared_ptr's to a given object can be expensive because refcount increment/decrement are atomic operations. 创建/销毁大量的shared_ptr到给定对象可能很昂贵,因为refcount增量/减量是原子操作。 In multi-threaded applications the memory locations which contains refcounts will be "hot" places, putting a lot of pressure on the memory subsystem. 在多线程应用程序中,包含refcounts的内存位置将是“热”位置,给内存子系统带来很大压力。 GC isn't prone to this specific issue, because it uses reachable sets instead of refcounts. GC不容易出现此特定问题,因为它使用可访问的集而不是refcounts。

I am not saying that GC is the best/good choice. 我不是说GC是最好/最好的选择。 I am just saying that it has different characteristics. 我只是说它有不同的特点。 In some scenarios that might be an advantage. 在某些情况下,这可能是一个优势。

None of the answers so far touch upon the most important benefit of adding garbage-collection to a language: In the absence of language-supported garbage-collection, it's almost impossible to guarantee that no object will be destroyed while references to it exist. 到目前为止,所有答案都没有涉及将垃圾收集添加到语言中的最重要的好处:在没有语言支持的垃圾收集的情况下,几乎不可能保证在存在对它的引用时不会销毁任何对象。 Worse, if such a thing does happen, it's almost impossible to guarantee that a later attempt to use the reference won't end up manipulating some other random object. 更糟糕的是,如果发生这样的事情,几乎不可能保证以后尝试使用引用不会最终操纵其他随机对象。

Although there are many kinds of objects whose lifetimes can be much better managed by RAII than by a garbage collector, there's considerable value in having the GC manage nearly all objects, including those whose lifetime is controlled by RAII . 虽然有很多种类的对象可以通过RAII比垃圾收集器更好地管理生命周期,但GC管理几乎所有对象( 包括那些生命周期由RAII控制的对象)具有相当大的价值。 An object's destructor should kill the object and make it useless, but leave the corpse behind for the GC. 一个对象的析构函数应该杀死该对象并使其无用,但将尸体留在GC后面。 Any reference to the object will thus become a reference to the corpse, and will remain one until it (the reference) ceases to exist entirely. 因此,对对象的任何引用都将成为对尸体的引用,并且在它(引用)完全不存在之前将保持为对象。 Only when all references to the corpse have ceased to exist will the corpse itself do so. 只有当所有对尸体的提及都不复存在时,尸体才会这样做。

While there are ways of implementing garbage collectors without inherent language support, such implementations either require that the GC be informed any time references are created or destroyed (adding considerable hassle and overhead), or run the risk that a reference the GC doesn't know about might exist to an object which is otherwise unreferenced. 虽然有一些方法可以在没有固有语言支持的情况下实现垃圾收集器,但是这样的实现要么在任何时候创建或销毁引用时都要通知GC(增加相当大的麻烦和开销),或者冒着GC不知道的引用的风险可能存在于未被引用的对象。 Compiler support for GC eliminates both those problems. 编译器对GC的支持消除了这两个问题。

Definitions: 定义:

RCB GC: Reference-Counting Based GC. RCB GC:基于参考计数的GC。

MSB GC: Mark-Sweep Based GC. MSB GC:基于Mark-Sweep的GC。

Quick Answer: 快速回答:

MSB GC should be added into the C++ standard, because it is more handy than RCB GC in certain cases. 应将MSB GC添加到C ++标准中,因为在某些情况下它比RCB GC更方便。

Two illustrative examples: 两个说明性示例:

Consider a global buffer whose initial size is small, and any thread can dynamically enlarge its size and keep the old contents accessible for other threads. 考虑一个初始大小很小的全局缓冲区,任何线程都可以动态扩大其大小,并保持旧内容可供其他线程访问。

Implementation 1 (MSB GC Version): 实施1(MSB GC版):

int*   g_buf = 0;
size_t g_current_buf_size = 1024;

void InitializeGlobalBuffer()
{
    g_buf = gcnew int[g_current_buf_size];
}

int GetValueFromGlobalBuffer(size_t index)
{
    return g_buf[index];
}

void EnlargeGlobalBufferSize(size_t new_size)
{
    if (new_size > g_current_buf_size)
    {
        auto tmp_buf = gcnew int[new_size];
        memcpy(tmp_buf, g_buf, g_current_buf_size * sizeof(int));       
        std::swap(tmp_buf, g_buf); 
    }   
}

Implementation 2 (RCB GC Version): 实施2(RCB GC版):

std::shared_ptr<int> g_buf;
size_t g_current_buf_size = 1024;

std::shared_ptr<int> NewBuffer(size_t size)
{
    return std::shared_ptr<int>(new int[size], []( int *p ) { delete[] p; });
}

void InitializeGlobalBuffer()
{
    g_buf = NewBuffer(g_current_buf_size);
}

int GetValueFromGlobalBuffer(size_t index)
{
    return g_buf[index];
}

void EnlargeGlobalBufferSize(size_t new_size)
{
    if (new_size > g_current_buf_size)
    {
        auto tmp_buf = NewBuffer(new_size);
        memcpy(tmp_buf, g_buf, g_current_buf_size * sizeof(int));       
        std::swap(tmp_buf, g_buf); 

        //
        // Now tmp_buf owns the old g_buf, when tmp_buf is destructed,
        // the old g_buf will also be deleted. 
        //      
    }   
}

PLEASE NOTE: 请注意:

After calling std::swap(tmp_buf, g_buf); 在调用std::swap(tmp_buf, g_buf); , tmp_buf owns the old g_buf . tmp_buf拥有旧的g_buf When tmp_buf is destructed, the old g_buf will also be deleted. tmp_buf被破坏时,旧的g_buf也将被删除。

If another thread is calling GetValueFromGlobalBuffer(index); 如果另一个线程正在调用GetValueFromGlobalBuffer(index); to fetch the value from the old g_buf , then A Race Hazard Will Occur!!! 要从旧的g_buf获取值,那么将会发生种族危险!

So, though implementation 2 looks as elegant as implementation 1, it doesn't work! 因此,虽然实现2看起来像实现1一样优雅,但它不起作用!

If we want to make implementation 2 work correctly, we must add some kind of lock-mechanism; 如果我们想让实现2正常工作,我们必须添加某种锁机制; then it will be not only slower, but less elegant than implementaion 1. 那么它不仅会比实施1更慢,而且更不优雅。

Conclusion: 结论:

It is good to take MSB GC into the C++ standard as an optional feature. 将MSB GC作为可选功能引入C ++标准是很好的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM