我的程序編寫每十億個組合的更有效方法是什么？

Question

因此，以下程序會在此主字符串中生成字符組合，您將在程序中看到。 首先程序生成所有48個選擇12個組合，然后一直到48個選擇19。

問題是組合的總數是65萬億，這在合理的時間內無法計算。 我想，“好吧，好吧，我將把每十億分之一寫入文件。” 那么，這也需要花費大量的時間，因為該計划仍然需要達到65萬億，即使它只寫了每十億個組合。

有什么我可以在我的程序中修改，以避免這必須計入一個非常大的數字，但仍然寫入每十億個組合到一個文件？

#include <iostream>
#include <string>
#include <iostream>
#include <fstream>
#include <vector>

using namespace std;

template <typename Iterator>
bool next_combination(const Iterator first, Iterator k, const Iterator last)
{
   if ((first == last) || (first == k) || (last == k))
      return false;
   Iterator i1 = first;
   Iterator i2 = last;
   ++i1;
   if (last == i1)
      return false;
   i1 = last;
   --i1;
   i1 = k;
   --i2;
   while (first != i1)
   {
      if (*--i1 < *i2)
      {
         Iterator j = k;
         while (!(*i1 < *j)) ++j;
         std::iter_swap(i1,j);
         ++i1;
         ++j;
         i2 = k;
         std::rotate(i1,j,last);
         while (last != j)
         {
            ++j;
            ++i2;
         }
         std::rotate(k,i2,last);
         return true;
      }
   }
   std::rotate(first,k,last);
   return false;
}

unsigned long long count = 0;

int main()
{
  ofstream myfile;
  myfile.open ("m = 8.txt");

  string s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnop";

  for (int i = 12; i <= 19; i++)
  {
    std::size_t comb_size = i;

    do
    { 
      if (count == 0)
        myfile << std::string(s.begin(),s.begin() + comb_size) << std::endl;

      if (++count % 1000000000 == 0)
        myfile << std::string(s.begin(),s.begin() + comb_size) << std::endl;

    }while(next_combination(s.begin(),s.begin()+ comb_size,s.end()));
  }

  myfile.close();

  cout << "Done!" << endl;

  system("PAUSE");
  return 0;
}

Answer 1

我有一個簡單的轉換，使用不同的庫，比你的快36倍。 它仍然是蠻力。 但是在我的機器上，我估計你的代碼需要418天才能完成，我的代碼只需要大約3.65天。 仍然很蠻長。 但它可以歸結為一個漫長的周末。

這是我的代碼：

#include <iostream>
#include <string>
#include <fstream>
#include "../combinations/combinations"

using namespace std;

unsigned long long count = 0;

int main()
{
  ofstream myfile("m = 8.txt");

  string s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnop";

  for (int i = 12; i <= 19; i++)
     for_each_combination(s.begin(), s.begin() + i, s.end(),
        [&](std::string::const_iterator f, std::string::const_iterator l) -> bool
        {
          if (::count++ % 1000000000 == 0)
            myfile << std::string(f, l) << std::endl;
          return false;
        });

  myfile.close();

  cout << "Done!" << endl;
  return 0;
}

在內循環中減少count測試count是性能提高15％。

“../combinations/combinations”指的是這個庫：

http://howardhinnant.github.io/combinations.html

該鏈接包括說明和完整源代碼。

也可以輕松修改此測試程序以計算組合總數：

#include <iostream>
#include <string>
#include "../combinations/combinations"

using namespace std;


int main()
{
  string s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnop";
  unsigned long long count = 0;
  for (int i = 12; i <= 19; i++)
     count += count_each_combination(s.begin(), s.begin() + i, s.end());

  cout << "Done! " << count << endl;
  return 0;
}

哪個輸出：

Done! 27189132782091

代碼是帶有boost許可證的開源代碼（它不是boost庫的一部分）。 隨意使用它。

Answer 2

這是我之前為找到給定字符串的第k個排列而編寫的代碼。 我認為我的想法與@Tarik類似，我們不需要在第k個之前列出所有排列。

string getPermutation(string s, int k) {
    string res;
    int n = s.size();
    int total = 1, digits = n - 1;
    for (int i = 1; i < n; ++i)
        total *= i;
    while (res.size() < n)
    {
        int i = 0;
        for (int m = 1; m < (int) ceil(k * 1.0 / total); ++m)
            i++;
        res += s[i];
        s.erase(s.begin() + i); // erase from string is not a good idea:)
        k = (k % total == 0) ? total : k % total;
        total = (total == 1) ? 1 : total / digits--;
    }
    return res;
}

它適用於短串。 例如， getPermutation("12345", 37)將返回24135 。

但是，對於您的字符串s長度為48 ，變量total甚至會與類型的溢出long long 。 所以我們需要做額外的工作處理。

我的代碼有點難以理解:) ~~你可以改進我的代碼。~~ 我希望這能幫到您。

UPDADE：我意識到你需要的是組合而不是排列。 我完全錯了！ 忘了我的代碼:)

Answer 3

來自http://en.wikipedia.org/wiki/Combinadic的算法可直接計算第k個組合。 首先需要存儲Pascal的三角形。 如果您需要一些代碼示例，可以查看（Python語言） https://github.com/sagemath/sagelib/blob/master/sage/combinat/choose_nk.py 。

Answer 4

您可以使用位向量來加速一些計算，這些計算是從國際象棋程序維基的苦惱頁面改編而來的。

#include <iostream>
#include <iomanip>
#include <cstdint>

using U64 = uint64_t;

// generate the next integer with the same number of bits as c
U64 next_combination(U64 c) 
{
    auto const smallest = c & -c;
    auto const ripple = c + smallest;
    auto ones = c ^ ripple;
    ones = (ones >> 2) / smallest;
    return ripple | ones;
}

// generate all integers with k of the first n bits set
template<class Function>
void for_each_combination(std::size_t n, std::size_t k, Function fun)
{
    U64 y;
    auto const n_mask = (1ULL << n) - 1; // mask with all n bits set to 1
    auto const k_mask = (1ULL << k) - 1; // mask with first k bits set to 1

    auto x = k_mask; fun(x);
    for (; (y = next_combination(x) & n_mask) > x; x = y) fun(y);
}

int main() 
{
    auto const million = 1000000ULL;
    auto count = U64 { 0 };
    for (auto i = 12; i < 20; ++i) {
        for_each_combination(48, i, [&](U64 c) {
        /*if (count++ & million == 0) std::cout << std::dec << std::setfill(' ') << std::setw(8) << (count - 1) / million << ": " << std::hex << std::showbase << std::setfill('0') << std::setw(16) << c << "\n";*/
            ++count;
        });
    }
    std::cout << count << "\n";
}

在我的Xeon E5-1650 @ 3.2 Ghz的虛擬盒內的單個核心上，我最好的估計是需要3.52天來增加計數器2.7e13次（不產生輸出本身）。 它僅適用於n <64的子集B（n，k），除非您使用某些128位整數類。

給定比特向量將n比特中的k設置為1，將其映射到原始的字符序列或任何其他類型並打印所需的任何組合是一件簡單的事情。 對於沒有隨機迭代器訪問的序列，它當然比Howard Hinnant的方法更昂貴。

Answer 5

如果你不關心實際的計數是什么，你使用一個32位的int，它仍然會讓你知道你達到了10億。

我的程序編寫每十億個組合的更有效方法是什么？

問題描述

5 個解決方案

解決方案1
4 2013-07-15 21:40:06

解決方案2
2 2013-07-15 15:35:50

解決方案3
1 2013-07-16 18:24:44

解決方案4
1 2013-07-17 20:40:07

解決方案5
-1 2013-07-15 15:16:48

我的程序編寫每十億個組合的更有效方法是什么？

問題描述

5 個解決方案

解決方案1 4 2013-07-15 21:40:06

解決方案2 2 2013-07-15 15:35:50

解決方案3 1 2013-07-16 18:24:44

解決方案4 1 2013-07-17 20:40:07

解決方案5 -1 2013-07-15 15:16:48

解決方案1
4 2013-07-15 21:40:06

解決方案2
2 2013-07-15 15:35:50

解決方案3
1 2013-07-16 18:24:44

解決方案4
1 2013-07-17 20:40:07

解決方案5
-1 2013-07-15 15:16:48