合並K排序的數組/向量復雜性

Question

在研究合並k個排序連續數組/向量的問題以及它在實現方面與合並k個排序鏈表之間的差異時，我發現了兩個相對簡單的天真解決方案，用於合並k個連續數組和一個基於成對合並模擬的優化方法mergeSort（）如何工作。 我實現的兩個天真的解決方案似乎具有相同的復雜性，但是在一個大的隨機測試中我運行它似乎比另一個更低效。

天真合並

我的天真合並方法的工作原理如下。 我們創建一個輸出vector<int>並將其設置為我們給出的k向量中的第一個。 然后我們合並第二個向量，然后合並第三個，依此類推。 由於典型的merge()方法接受兩個向量並返回一個，在空間和時間上漸近線性到兩個向量中的元素數量，總復雜度將為O(n + 2n + 3n + ... + kn)其中n是每個列表中的平均元素數。 由於我們加入1n + 2n + 3n + ... + kn我相信總復雜度為O(n*k^2) 。 請考慮以下代碼：

vector<int> mergeInefficient(const vector<vector<int> >& multiList) {
  vector<int> finalList = multiList[0];
  for (int j = 1; j < multiList.size(); ++j) {
    finalList = mergeLists(multiList[j], finalList);
  }

  return finalList;
}

天真的選擇

我的第二個天真的解決方案如下：

/**
 * The logic behind this algorithm is fairly simple and inefficient.
 * Basically we want to start with the first values of each of the k
 * vectors, pick the smallest value and push it to our finalList vector.
 * We then need to be looking at the next value of the vector we took the
 * value from so we don't keep taking the same value. A vector of vector
 * iterators is used to hold our position in each vector. While all iterators
 * are not at the .end() of their corresponding vector, we maintain a minValue
 * variable initialized to INT_MAX, and a minValueIndex variable and iterate over
 * each of the k vector iterators and if the current iterator is not an end position
 * we check to see if it is smaller than our minValue. If it is, we update our minValue
 * and set our minValue index (this is so we later know which iterator to increment after
 * we iterate through all of them). We do a check after our iteration to see if minValue
 * still equals INT_MAX. If it has, all iterators are at the .end() position, and we have
 * exhausted every vector and can stop iterative over all k of them. Regarding the complexity
 * of this method, we are iterating over `k` vectors so long as at least one value has not been
 * accounted for. Since there are `nk` values where `n` is the average number of elements in each
 * list, the time complexity = O(nk^2) like our other naive method.
 */
vector<int> mergeInefficientV2(const vector<vector<int> >& multiList) {
  vector<int> finalList;
  vector<vector<int>::const_iterator> iterators(multiList.size());

  // Set all iterators to the beginning of their corresponding vectors in multiList
  for (int i = 0; i < multiList.size(); ++i) iterators[i] = multiList[i].begin();

  int k = 0, minValue, minValueIndex;

  while (1) {
    minValue = INT_MAX;
    for (int i = 0; i < iterators.size(); ++i){
      if (iterators[i] == multiList[i].end()) continue;

      if (*iterators[i] < minValue) {
        minValue = *iterators[i];
        minValueIndex = i;
      }
    }

    iterators[minValueIndex]++;

    if (minValue == INT_MAX) break;
    finalList.push_back(minValue);
  }

  return finalList;
}

隨機模擬

簡而言之，我構建了一個簡單的隨機模擬，構建了一個多維vector<vector<int>> 。 多維矢量的開頭是2每個大小的矢量2 ，並用最終600的每個大小的矢量600 。 對每個向量進行排序，每次迭代時，較大容器和每個子向量的大小增加兩個元素。 我計算每個算法執行的時間長度如下：

clock_t clock_a_start = clock();
finalList = mergeInefficient(multiList);
clock_t clock_a_stop = clock();

clock_t clock_b_start = clock();
finalList = mergeInefficientV2(multiList);
clock_t clock_b_stop = clock();

然后我建立了以下情節：

我的計算表明兩種天真的解決方案（合並和選擇）都具有相同的時間復雜度，但上圖顯示它們非常不同。 起初我通過說一個與另一個可能有更多的開銷來合理化這個，但后來意識到開銷應該是一個常數因素而不會產生如下的情節。 對此有何解釋？ 我認為我的復雜性分析是錯誤的？

Answer 1

即使兩個算法具有相同的復雜度（在您的情況下為O(nk^2) ），它們最終可能會有非常不同的運行時間，具體取決於您的輸入大小和所涉及的“常數”因素。

例如，如果一個算法在n/1000時間內運行而另一個算法在1000n時間內運行，它們都具有相同的漸近復雜度，但是對於n “合理”選擇它們將具有非常不同的運行時間。

此外，由緩存，編譯器優化等引起的效果可能會顯着改變運行時間。

對於您的情況，雖然您的復雜度計算似乎是正確的，但在第一種情況下，實際運行時間應為(nk^2 + nk)/2而在第二種情況下，運行時間應為nk^2 。 請注意，除以2可能很重要，因為當k增加時， nk項應該可以忽略不計。

對於第三種算法，您可以通過維護包含所有k向量的第一個元素的k元素堆來修改Naive選擇。 那么你的選擇過程應該花費O(logk)時間，因此復雜度應減少到O(nklogk) 。

合並K排序的數組/向量復雜性

問題描述

天真合並

天真的選擇

隨機模擬

1 個解決方案

解決方案1
3 已采納 2016-08-29 03:28:37

合並K排序的數組/向量復雜性

問題描述

天真合並

天真的選擇

隨機模擬

1 個解決方案

解決方案1 3 已采納 2016-08-29 03:28:37

解決方案1
3 已采納 2016-08-29 03:28:37