16 個嵌套 For 循環加速 C++/Rcpp

Question

我有一個計算量非常大的程序，我需要運行 16 個嵌套的 for 循環，以完成對 16 個數字向量的所有可能排列的迭代檢查，每個向量的大小為 26。我的第一次嘗試是在R （我的首選語言）中，但很快通過Rcpp package 重定向到C++ 。 我可以在我的 PC 上本地運行代碼（4 核，Intel i7-6600U CPU @ 2.60GHz，16GB RAM），但也可以訪問 Azure 雲計算，並且可以啟動任何大小的集群。

我當前的代碼如下所示：

#include <Rcpp.h>
#include <math.h>
#include <iostream>
using namespace Rcpp;

// [[Rcpp::export]]
NumericMatrix optimalIndex(NumericVector a, NumericVector b, NumericVector c, NumericVector d, NumericVector e, NumericVector f,
                           NumericVector g, NumericVector h, NumericVector i, NumericVector j, NumericVector k, NumericVector l,
                           NumericVector m, NumericVector n, NumericVector o, NumericVector p){
  NumericMatrix outp(1000000, 16);
  int index = 0;
  int minsum = 0;
  for(int c1 = 0; c1 < a.size(); c1++){
    for(int c2 = 0; c2 < b.size(); c2++){
      for(int c3 = 0; c3 < c.size(); c3++){
        for(int c4 = 0; c4 < d.size(); c4++){
          for(int c5 = 0; c5 < e.size(); c5++){
            for(int c6 = 0; c6 < f.size(); c6++){
              for(int c7 = 0; c7 < g.size(); c7++){
                for(int c8 = 0; c8 < h.size(); c8++){
                  for(int c9 = 0; c9 < i.size(); c9++){
                    for(int c10 = 0; c10 < j.size(); c10++){
                      for(int c11 = 0; c11 < k.size(); c11++){
                        for(int c12 = 0; c12 < l.size(); c12++){
                          for(int c13 = 0; c13 < m.size(); c13++){
                            for(int c14 = 0; c14 < n.size(); c14++){
                              for(int c15 = 0; c15 < o.size(); c15++){
                                for(int c16 = 0; c16 < p.size(); c16++){
                                  minsum = a(c1) + b(c2) + c(c3) + d(c4) + e(c5) + f(c6)
                                            + g(c7) + h(c8) + i(c9) + j(c10) + k(c11) + l(c12)
                                            + m(c13) + n(c14) + o(c15) + p(c16);
                                  if(minsum == 0){
                                    outp(index, 0) = c1;
                                    outp(index, 1) = c2;
                                    outp(index, 2) = c3;
                                    outp(index, 3) = c4;
                                    outp(index, 4) = c5;
                                    outp(index, 5) = c6;
                                    outp(index, 6) = c7;
                                    outp(index, 7) = c8;
                                    outp(index, 8) = c9;
                                    outp(index, 9) = c10;
                                    outp(index, 10) = c11;
                                    outp(index, 11) = c12;
                                    outp(index, 12) = c13;
                                    outp(index, 13) = c14;
                                    outp(index, 14) = c15;
                                    outp(index, 15) = c16;
                                    outp(index, 16) = c17;
                                    outp(index, 17) = c18;
                                    outp(index, 18) = c19;
                                    outp(index, 19) = c20;
                                    outp(index, 20) = c21;
                                    outp(index, 21) = c22;
                                    outp(index, 22) = c23;
                                    outp(index, 23) = c24;
                                    outp(index, 24) = c25;
                                    outp(index, 25) = c26;
                                    outp(index, 26) = c27;
                                    outp(index, 27) = c28;
                                    outp(index, 28) = c29;
                                    outp(index, 29) = c30;
                                    outp(index, 30) = c31;
                                    index++;
                                  }
                                }
                              }
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
  return(outp);
}

這個 function 的 output 的維度， outp ，此時未知，所以我隨意選擇了100萬行。 我想返回行和與指定條件匹配的每一列的索引，即。 = 0。

顯然，這需要幾年的時間來運行。 我不確定並行化是否是此循環的一個選項，或者我可以使用哪些其他方法來提高速度。 就像我說的，如果可以的話，我可以在 Azure 中運行更多內核和/或更多 memory。

有沒有更好/更快的方法來做到這一點？

Answer 1

我認為不可能在合理的時間內運行這個程序，因為 26^16 等於 43,608,742,899,428,874,059,776。 我認為使用狀態為 dp[SUM][letterIndex] 的動態編程可以更快地解決這個問題。 您可以預先計算有多少個字母組合，總和等於 SUM 並使用 letterIndex 字母。 此解決方案的復雜性為 O(26*16*MAX)，其中 MAX 是向量中的最大值。 當然，如果向量中的數字是整數，這是有效的。

16 個嵌套 For 循環加速 C++/Rcpp

問題描述

1 個解決方案

解決方案1
0 2020-05-26 16:44:55

16 個嵌套 For 循環加速 C++/Rcpp

問題描述

1 個解決方案

解決方案1 0 2020-05-26 16:44:55

解決方案1
0 2020-05-26 16:44:55