根據英特爾博客實現 concurrent_vector

Question

我正在嘗試實現一個線程安全的無鎖容器，類似於 std::vector，根據這個https://software.intel.com/en-us/blogs/2008/07/24/tbbconcurrent_vector-secrets-of-記憶組織

據我所知，為了防止重新分配並使所有線程上的所有迭代器無效，而不是單個連續數組，它們添加了新的連續塊。
他們添加的每個塊的大小都是 2 的遞增冪，因此他們可以使用 log(index) 找到 [index] 處的項目應該在的正確段。

從我收集的信息來看，他們有一個指向段的靜態指針數組，因此他們可以快速訪問它們，但是他們不知道用戶想要多少段，所以他們做了一個小的初始段，如果段的數量超過當前計數，他們分配了一個巨大的並切換到使用那個。

問題是，添加新段無法以無鎖線程安全的方式完成，或者至少我還沒有弄清楚如何。 我可以原子地增加當前大小，但僅此而已。
而且從段指針的小數組切換到大數組涉及大量分配和內存副本，所以我無法理解他們是如何做到的。

他們在網上發布了一些代碼，但所有重要的功能都沒有可用的源代碼，它們在他們的線程構建塊 DLL 中。 下面是一些演示該問題的代碼：

template<typename T>
class concurrent_vector
{
    private:
        int size = 0;
        int lastSegmentIndex = 0;

        union
        {
            T* segmentsSmall[3];
            T** segmentsLarge;
        };

        void switch_to_large()
        {
            //Bunch of allocations, creates a T* segmentsLarge[32] basically and reassigns all old entries into it
        }

    public:
        concurrent_vector()
        {
            //The initial array is contiguous just for the sake of cache optimization
            T* initialContiguousBlock = new T[2 + 4 + 8]; //2^1 + 2^2 + 2^3
            segmentsSmall[0] = initialContiguousBlock;
            segmentsSmall[1] = initialContiguousBlock + 2;
            segmentsSmall[2] = initialContiguousBlock + 2 + 4;
        }

        void push_back(T& item)
        {
            if(size > 2 + 4 + 8)
            {
                switch_to_large(); //This is the problem part, there is no possible way to make this thread-safe without a mutex lock. I don't understand how Intel does it. It includes a bunch of allocations and memory copies.
            }

            InterlockedIncrement(&size); //Ok, so size is atomically increased

            //afterwards adds the item to the appropriate slot in the appropriate segment
        }
};

Answer 1

我不會嘗試使segmentsLarge和segmentsSmall成為一個聯合。 是的，這又浪費了一個指針。 然后是指針，我們稱之為segments ，最初可以指向segmentsSmall。

另一方面，其他方法總是可以使用相同的指針，這使它們更簡單。

而從小到大的切換可以通過一個指針的一次比較交換來完成。

我不確定如何通過工會安全地實現這一點。

這個想法看起來像這樣（請注意，我使用了 C++11，英特爾庫早於它，所以他們很可能用他們的原子內在函數來做到這一點）。 這可能遺漏了很多細節，我相信英特爾人員已經考慮過更多，因此您可能必須對照所有其他方法的實現來檢查這一點。

#include <atomic>
#include <array>
#include <cstddef>
#include <climits>

template<typename T>
class concurrent_vector
{
private:
  std::atomic<size_t> size;
  std::atomic<T**> segments;
  std::array<T*, 3> segmentsSmall;
  unsigned lastSegmentIndex = 0;

  void switch_to_large()
  {
    T** segmentsOld = segments;
    if( segmentsOld == segmentsSmall.data()) {
      // not yet switched
      T** segmentsLarge = new T*[sizeof(size_t) * CHAR_BIT];
      // note that we leave the original segment allocations alone and just copy the pointers
      std::copy(segmentsSmall.begin(), segmentsSmall.end(), segmentsLarge);
      for(unsigned i = segmentsSmall.size(); i < numSegments; ++i) {
        segmentsLarge[i] = nullptr;
      }
      // now both the old and the new segments array are valid
      if( segments.compare_exchange_strong(segmentsOld, segmentsLarge)) {
        // success!
        return;
      }  else {
        // already switched, just clean up
        delete[] segmentsLarge;
      }
    }
  }

public:
  concurrent_vector()  : size(0), segments(segmentsSmall.data())
  {
    //The initial array is contiguous just for the sake of cache optimization
    T* initialContiguousBlock = new T[2 + 4 + 8]; //2^1 + 2^2 + 2^3
    segmentsSmall[0] = initialContiguousBlock;
    segmentsSmall[1] = initialContiguousBlock + 2;
    segmentsSmall[2] = initialContiguousBlock + 2 + 4;
  }

  void push_back(T& item)
  {
    if(size > 2 + 4 + 8) {
      switch_to_large();
    }
    // here we may have to allocate more segments atomically
    ++size;

    //afterwards adds the item to the appropriate slot in the appropriate segment
  }
};

根據英特爾博客實現 concurrent_vector

問題描述

1 個解決方案

解決方案1
3 已采納 2017-10-19 13:45:57

根據英特爾博客實現 concurrent_vector

問題描述

1 個解決方案

解決方案1 3 已采納 2017-10-19 13:45:57

解決方案1
3 已采納 2017-10-19 13:45:57