简体   繁体   English

C ++ weak_ptr创建性能

[英]C++ weak_ptr creation performance

I've read that creating or copying a std::shared_ptr involves some overhead (atomic increment of reference counter etc..). 我已经读过创建或复制std :: shared_ptr涉及一些开销(引用计数器的原子增量等)。

But what about creating a std::weak_ptr from it instead: 但是如何从它创建一个std :: weak_ptr:

Obj * obj = new Obj();
// fast
Obj * o = obj;
// slow
std::shared_ptr<Obj> a(o);
// slow
std::shared_ptr<Obj> b(a);
// slow ?
std::weak_ptr<Obj> c(b);

I was hoping in some faster performance, but i know that the shared pointer still have to increment the weak references counter.. So is this still as slow as copying a shared_ptr into another? 我希望在一些更快的性能,但我知道共享指针仍然必须递增弱引用计数器..所以这仍然像将shared_ptr复制到另一个慢?

This is from my days with game engines 这是我与游戏引擎的日子

The story goes: 故事如下:

We need a fast shared pointer implementation, one that doesn't thrash the cache (caches are smarter now btw) 我们需要一个快速的共享指针实现,不会破坏缓存(缓存现在更聪明btw)

A normal pointer: 正常指针:

XXXXXXXXXXXX....
^--pointer to data

Our shared pointer: 我们的共享指针:

iiiiXXXXXXXXXXXXXXXXX...
^   ^---pointer stored in shared pointer
|
+---the start of the allocation, the allocation is sizeof(unsigned int)+sizeof(T)

The unsigned int* used for the count is at ((unsigned int*)ptr)-1 用于计数的unsigned int*位于((unsigned int*)ptr)-1

that way a "shared pointer" is pointer sized,and the data it contains is the pointer to the actual data. 这样一个“共享指针”是指针大小,它包含的数据是指向实际数据的指针。 So (because template => inline and any compiler would inline an operator returning a data member) it was the same "overhead" for access as a normal pointer. 所以(因为template => inline而且任何编译器都会内联一个运算符返回一个数据成员)它与普通指针的访问权限相同。

Creation of pointers took like 3 more CPU instructions than normal (access to a location-4 is on operation, the add of 1 and the write to location -4) 创建指针需要比正常情况多3个CPU指令(访问位置-4正在运行,添加1和写入位置-4)

Now we'd only use weak-pointers when we were debugging (so we'd compile with DEBUG defined (macro definition)) because then we'd like to see all allocations and whats going on and such. 现在我们在调试时只使用弱指针(因此我们使用DEBUG定义(宏定义)进行编译)因为那时我们希望看到所有的分配和最新情况等等。 It was useful. 这很有用。

The weak-pointers must know when what they point to is gone, NOT keep the thing they point to alive (in my case, if the weak pointer kept the allocation alive the engine would never get to recycle or free any memory, then it's basically a shared pointer anyway) 弱指针必须知道他们指向的东西何时消失,不要保持他们指向的东西(在我的情况下,如果弱指针保持分配活着,引擎永远不会回收或释放任何记忆,那么它基本上是无论如何共享指针)

So each weak-pointer has a bool, alive or something, and is a friend of shared_pointer 所以每个弱指针都有一个bool, alive或者什么东西,并且是shared_pointer的朋友

When debugging our allocation looked like this: 调试我们的分配时看起来像这样:

vvvvvvvviiiiXXXXXXXXXXXXX.....
^       ^   ^ the pointer we stored (to the data)
|       +that pointer -4 bytes = ref counter
+Initial allocation now 
    sizeof(linked_list<weak_pointer<T>*>)+sizeof(unsigned int)+sizeof(T)

The linked list structure you use depends on what you care about, we wanted to stay as close to sizeof(T) as we could (we managed memory using the buddy algorithm) so we stored a pointer to the weak_pointer and used the xor trick.... good times. 您使用的链表结构取决于您关注的内容,我们希望保持尽可能接近sizeof(T)(我们使用伙伴算法管理内存),因此我们存储了指向weak_pointer的指针并使用了xor技巧。 ... 美好时光。

Anyway: the weak pointers to something shared_pointers point to are put in a list, stored somehow in the "v"s above. 无论如何:指向shared_pointers指向的东西的弱指针放在一个列表中,以某种方式存储在上面的“v”中。

When the reference count hits zero, you go through that list (which is a list of pointers to actual weak_pointers, they remove themselves when deleted obviously) and you set alive=false (or something) to each weak_pointer. 当引用计数达到零时,您将浏览该列表(这是指向实际weak_pointers的指针列表,当它们显然被删除时它们将自行删除)并且您将alive = false(或其他内容)设置为每个weak_pointer。

The weak_pointers now know what they point to is no longer there (so threw when de-referenced) weak_pointers现在知道他们指向的东西不再存在(所以在被引用时扔掉了)

In this example 在这个例子中

There is no overhead (the alignment was 4 bytes with the system. 64 bit systems tend to like 8 byte alignments.... union the ref-counter with an int[2] in there to pad it out in that case. Remember this involves inplace news (nobody downvote because I mentioned them :P) and such. You need to make sure the struct you impose on the allocation matches what you allocated and made. Compilers can align stuff for themselves (hence int[2] not int,int). 没有开销(系统的对齐是4个字节.64位系统往往喜欢8个字节的对齐....在那里将ref-counter与int [2]结合在一起以填充它。记住这个涉及到新闻(没有人投票,因为我提到它们:P)等等。你需要确保你对分配施加的struct与你分配和制作的struct相匹配。编译器可以自己对齐东西(因此int [2]不是int, INT)。

You can de-reference the shared_pointer with no overhead at all. 您可以完全取消引用shared_pointer,而不需要任何开销。

New shared pointers being made do not thrash the cache at all and require 3 CPU instructions, they are not very... pipe-line-able but the compiler will inline getters and setters always (if not probably always :P) and there'll be something around the call-site that can fill the pipeline. 正在制作的新共享指针根本不会破坏缓存并且需要3个CPU指令,它们不是很容易管道但是编译器总是会内联getter和setter(如果不是总是:P)那么'将成为可以填充管道的呼叫站点周围的东西。

The destructor of a shared pointer also does very little (decrements, that's it) so is great! 共享指针的析构函数也很少(递减,就是这样),所以很棒!

High performance note 高性能笔记

If you have a situation like: 如果你有这样的情况:

f() {
   shared_pointer<T> ptr;
   g(ptr);
}

There's no guarantee that the optimiser will dare to not do the adds and subtractions from passing shared_pointer "by value" to g. 无法保证优化器不敢通过“按值”将shared_pointer传递给g来进行加法和减法。

This is where you'd use a normal reference (which is implemented as a pointer) 这是您使用普通引用(实现为指针)的地方

so you'd do g(ptr.extract_reference()); 所以你要做g(ptr.extract_reference()); instead - again the compiler will inline the simple getter. 相反 - 编译器将再次内联简单的getter。

now you have a T&, because ptr's scope entirely surrounds g (assuming g has no side-effects and so forth) that reference will be valid for the duration of g. 现在你有一个T&,因为ptr的范围完全包围g(假设g没有副作用等等),该引用在g的持续时间内有效。

deleting references is very ugly and you probably couldn't do it by accident (we relied on this fact). 删除引用是非常难看的,你可能不会偶然做到(我们依赖这个事实)。

In hindsight 事后来看

I should have created a type called "extracted_pointer" or something, it'd be really hard to type that by mistake for a class member. 我本应该创建一个名为“extracted_pointer”的类型,或者很难为类成员输入错误的类型。

The weak/shared pointers used by stdlib++ stdlib ++使用的弱/共享指针

http://gcc.gnu.org/onlinedocs/libstdc++/manual/shared_ptr.html http://gcc.gnu.org/onlinedocs/libstdc++/manual/shared_ptr.html

Not as fast... 不是那么快......

But don't worry about the odd cache miss unless you're making a game engine that isn't running a decent workload > 120fps easily :P Still miles better than Java. 但是不要担心奇怪的缓存未命中,除非你制作一个游戏引擎没有运行不错的工作量> 120fps:P仍然比Java更好。

The stdlib way is nicer. stdlib方式更好。 Each object has it's own allocation and job. 每个对象都有自己的分配和工作。 With our shared_pointer it was a true case of "trust me it works, try not to worry about how" (not that it is hard) because the code looked really messy. 使用我们的shared_pointer这是一个真实的例子,“相信我的工作,不要担心如何”(不是很难),因为代码看起来非常混乱。

If you undid the ... whatever they've done to the names of variables in their implementation it'd be far easier to read. 如果你解除了......他们对他们实现中的变量名称做了什么,它会更容易阅读。 See Boost's implementation, as it says in that documents. 参见Boost的实现,正如文档中所述。

Other than variable names the GCC stdlib implementation is lovely. 除了变量名之外,GCC stdlib实现很可爱。 You can read it easily, it does it's job properly (following the OO principle) but is a little slower and MAY thrash the cache on crappy chips these days. 你可以很容易地阅读它,它可以正常工作(遵循OO原则)但速度稍慢,并且可能会在最近蹩脚的芯片上破坏缓存。

UBER high performance note UBER高性能笔记

You may be thinking, why not have XXXX...XXXXiiii (the reference count at the end) then you'll get the alignment that's best fro the allocator! 您可能在想,为什么不拥有XXXX...XXXXiiii (最后的引用计数)然后您将得到最好的分配器对齐!

Answer: 回答:

Because having to do pointer+sizeof(T) may not be one CPU instruction! 因为必须做pointer+sizeof(T)可能不是一个CPU指令! (Subtracting 4 or 8 is something a CPU can do easy simply because it makes sense, it'll be doing this a lot) (减去4或8是CPU可以轻松完成的事情,因为它有意义,它会做很多事情)

In addition to Alec's very interesting description of the shared/weak_ptr system used in his previous projects, I wanted to give a little more detail on what is likely to be happening for a typical std::shared_ptr/weak_ptr implementation: 除了Alec对他之前项目中使用的shared / weak_ptr系统非常有趣的描述之外,我还想详细介绍一下典型的std::shared_ptr/weak_ptr实现可能发生的事情:

// slow
std::shared_ptr<Obj> a(o);

The main expense in the above construction is to allocate a block of memory to hold the two reference counts. 上述结构的主要费用是分配一块内存来保存两个引用计数。 No atomic operations need be done here (aside from what the implementation may or may not do under operator new ). 这里不需要进行原子操作(除了在operator new下执行可能会或可能不会执行的operator new )。

// slow
std::shared_ptr<Obj> b(a);

The main expense in the copy construction is typically a single atomic increment. 复制构造中的主要开销通常是单个原子增量。

// slow ?
std::weak_ptr<Obj> c(b);

The main expense in the this weak_ptr constructor is typically a single atomic increment. 这个weak_ptr构造函数的主要开销通常是单个原子增量。 I would expect the performance of this constructor to be nearly identical to that of the shared_ptr copy constructor. 我希望这个构造函数的性能几乎与shared_ptr复制构造函数的性能相同。

Two other important constructors to be aware of are: 另外两个要注意的重要构造函数是:

std::shared_ptr<Obj> d(std::move(a));  // shared_ptr(shared_ptr&&);
std::weak_ptr<Obj> e(std::move( c ));  // weak_ptr(weak_ptr&&);

(And matching move assignment operators as well) (以及匹配的移动赋值运算符)

The move constructors do not require any atomic operations at all. 移动构造函数根本不需要任何原子操作。 They just copy the reference count from the rhs to the lhs, and make the rhs == nullptr. 他们只是将引用计数从rhs复制到lhs,并使rhs == nullptr。

The move assignment operators require an atomic decrement only if the lhs != nullptr prior to the assignment. 仅当赋值之前的lhs!= nullptr时,移动赋值运算符才需要原子递减。 The bulk of the time (eg within a vector<shared_ptr<T>> ) the lhs == nullptr prior to a move assignment, and so there are no atomic operations at all. 大部分时间(例如在vector<shared_ptr<T>> )移动赋值之前的lhs == nullptr,因此根本没有原子操作。

The latter (the weak_ptr move members) are not actually C++11, but are being handled by LWG 2315 . 后者( weak_ptr移动成员)实际上不是C ++ 11,而是由LWG 2315处理。 However I would expect it to already be implemented by most implementations (I know it is already implemented in libc++ ). 但是我希望它已经被大多数实现实现(我知道它已经在libc ++中实现)。

These move members will be used when scooting smart pointers around in containers, eg under vector<shared_ptr<T>>::insert/erase , and can have a measurable positive impact compared to the use of the smart pointer copy members. 当在容器中搜索智能指针时,例如在vector<shared_ptr<T>>::insert/erase ,将使用这些移动成员,并且与使用智能指针复制成员相比,可以产生可测量的积极影响。

I point it out so that you will know that if you have the opportunity to move instead of copy a shared_ptr/weak_ptr , it is worth the trouble to type the few extra characters to do so. 我指出它,以便你知道如果你有机会移动而不是复制一个shared_ptr/weak_ptr ,那么输入一些额外的字符是值得的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM