简体   繁体   English

优化速度:向量队列与向量指针队列

[英]Optimizing for Speed: Queue of Vectors vs. Queue of Pointers to Vectors

I am trying to store vectors (actually an object that manages the vector) onto a queue so that I can process them later. 我正在尝试将向量(实际上是管理向量的对象)存储到队列中,以便稍后进行处理。

Here's my current implementation: 这是我当前的实现:

// in constructor:  
q = new boost::lockfree::spsc_queue<MyObject>(num_elements_in_q);
// ...
bool Push(const MyObject& push_me) { return q->push(push_me); }
//  ...  
// in Pop() (i.e., this is how I pop stuff off of the queue)  
MyObject temp;  
q->pop(&temp);  

I am wondering if it would make sense to store pointers instead of the object. 我想知道存储指针而不是对象是否有意义。 Here's how the new code would look like: 这是新代码的样子:

// in constructor:  
q = new boost::lockfree::spsc_queue<MyObject*>(num_elements_in_q);
// ...
bool Push(const MyObject& push_me) {
  MyObject* ptr = new MyObject(push_me);
  return q->push(push_me);  
}
//  ...  
// in Pop() (i.e., this is how I pop stuff off of the queue)  
MyObject* ptr;  
q->pop(&ptr);
//  do stuff with ptr
delete ptr;

Which approach is best in terms of minimizing the amount of time that the push operation takes? 就最大程度地减少推送操作所需的时间而言,哪种方法最好? In general is it best to store the entire MyObject or just have pointers stored (and allocate the memory dynamically)? 通常,最好是存储整个MyObject还是仅存储指针(并动态分配内存)? I realize that by storing the entire MyObject, there's still dynamic memory involved since the vector inside MyObject needs to be resized. 我意识到通过存储整个MyObject,仍然涉及动态内存,因为MyObject中的向量需要调整大小。

My ultimate goal is to minimize the time pushing takes (as well as any time jitter from one operation to the next), at the expense of memory usage and the time it takes for Pop() to execute (the top version requires a copy in Pop() that is avoided by using pointers). 我的最终目标是最大程度地减少推送时间(以及从一个操作到下一个操作的任何时间抖动),但要消耗内存使用量和执行Pop()所花费的时间(最高版本需要复制通过使用指针避免的Pop())。

Thanks for the help. 谢谢您的帮助。 Also, I do not currently have access to a profiler on this system, otherwise I might already have my answer. 另外,我目前无法访问此系统上的探查器,否则我可能已经有了答案。

Without actually testing it, I would say the memory allocation using new could cost more than copying the whole MyObject. 如果没有实际测试,我会说使用new进行内存分配可能比复制整个MyObject花费更多。 Of course it depends on how MyObject is implemented. 当然,这取决于MyObject的实现方式。

Another thing to consider is that storing object itself may give you some higher cache hit rates, assuming boost::lock_free stores data in a continuous memory. 要考虑的另一件事是,假设boost :: lock_free将数据存储在连续内存中,存储对象本身可能会给您带来更高的缓存命中率。 Because all your objects can be read by cpu in a batch and therefore stored in L1 cache together. 因为您的所有对象都可以被cpu批量读取,因此一起存储在L1缓存中。 Using pointer will cause CPU to load things from the memory the pointer point to, and potentially kick other elements in the queue out of cache. 使用指针将导致CPU从指针指向的内存中加载内容,并可能将队列中的其他元素踢出缓存。

Of course, to be 100% sure you have to measure it. 当然,要确保100%必须测量它。

If speed is the ultimate goal look at using some sort in intrusive pattern. 如果速度是最终目标,请考虑以侵入性方式使用某种形式。 By intrusive, I mean, add linking pointers to each of your objects and use these pointers to construct your queues. 侵入式,我的意思是,向每个对象添加链接指针,并使用这些指针来构造队列。 The big advantage is that there is zero memory allocation when adding an object to the queue. 最大的优点是将对象添加到队列时内存分配为零。 And if you allocate all your objects in one big block (like using a vector), your objects will remain close together. 如果将所有对象分配在一个大块中(例如使用矢量),则对象将保持紧密靠近。 This means that iterating through the list will be less likely to incur cache misses. 这意味着遍历该列表将不太可能引起高速缓存未命中。

This does mean that you will probably need to implement your own locking on the queue but please bear in mind that properly implemented uncontended mutexes should be more or less as cheap as the atomic operations used for lock free programming. 这确实意味着您可能需要在队列上实现自己的锁定,但是请记住,正确实现的无竞争的互斥锁应与用于无锁编程的原子操作差不多便宜。

Take a look at: Boost Intrusive for details of the templated boost implementation. 看一下: Boost Intrusive ,了解模板化boost实现的详细信息。

Given that the only real way to figure out what is going on is to measure, I used a crude way to figure out what my execution times (for both implementations were). 鉴于弄清正在发生的事情的唯一真实方法是测量,因此我使用了一种粗略的方法来弄清楚我的执行时间(对于两个实现都是如此)。

The following are results from a run of 2500 insertions into the queue. 以下是队列中插入2500次的结果。 Times are in seconds based on a boost::timer surrounding the function call. 时间以秒为单位,基于围绕函数调用的boost :: timer。 Note these are average times per call. 请注意,这些是每次通话的平均时间。

For storing whole objects: 用于存储整个对象:
Run 1: 0.000343423 运行1:0.000343423
Run 2: 0.000338752 运行2:0.000338752
Run 3: 0.000339651 运行3:0.000339651
Run 4: 0.000320011 运行4:0.000320011
Run 5: 0.00034017 运行5:0.00034017

For storing pointers: 用于存储指针:
Run 1: 0.00033717 运行1:0.00033717
Run 2: 0.00033645 行程2:0.00033645
Run 3: 0.000336106 运行3:0.000336106
Run 4: 0.00033674 运行4:0.00033674
Run 5: 0.000336841 运行5:0.000336841

I then went make and increased the test to 25,000 insertions since I was wondering if there was something going on initially with cache misses and the like. 然后,我开始进行测试并将测试增加到25,000次插入,因为我想知道最初是否发生了某些与高速缓存未命中等类似的情况。 Results are below: 结果如下:

For storing whole objects: 用于存储整个对象:
Run 1: 0.00023566 运行1:0.00023566
Run 2: 0.000255699 运行2:0.000255699
Run 3: 0.000250765 运行3:0.000250765
Run 4: 0.000239108 运行4:0.000239108
Run 5: 0.000264594 运行5:0.000264594

For storing pointers: 用于存储指针:
Run 1: 0.000317314 运行1:0.000317314
Run 2: 0.000316985 运行2:0.000316985
Run 3: 0.000414893 运行3:0.000414893
Run 4: 0.000334542 运行4:0.000334542
Run 5: 0.00033179 运行5:0.00033179

So it looks like (and this just my theory) that on the initial Push() calls the vectors found in the objects are properly resized. 因此,看起来(和我的理论一样)在最初的Push()调用中,正确调整了对象中找到的向量的大小。 From there, the copy constructor no longer has to pay the penalty of resizing the vector each time and it becomes a much more efficient process. 从那里开始,复制构造函数不再需要每次都为向量调整大小而付出代价,它变得更加高效。

Agreed that storing a pointer has to be cheaper than storing something larger than a pointer, in almost every circumstance. 同意在几乎每种情况下,存储指针都必须比存储比指针大的东西便宜。

In each case, there appears to be a copy construction of a MyObject. 在每种情况下,似乎都有MyObject的副本构造。 By letting the caller be responsible for the lifetime of the object, there is the opportunity to remove this construction: 通过让调用者负责对象的生存期,就有机会删除此构造:

  1. Offering an rvalue interface would allow the use of move construction instead, which may be considerably more light weight, depending on the representation chosen for MyObject. 提供右值接口将允许改用move结构,这可能要轻得多,具体取决于为MyObject选择的表示形式。
  2. Alternately, you could pass and queue std::unique_ptr<MyObject> smart pointers instead, forcing the caller to explicitly manage the construction and lifetime guarantees of the objects. 或者,您可以传递std::unique_ptr<MyObject>智能指针并使之排队,从而强制调用者显式管理对象的构造和生存期保证。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM