我的程序编写每十亿个组合的更有效方法是什么？

Question

So the following program generates combinations on characters in this master string, which you will see in the program. 因此，以下程序会在此主字符串中生成字符组合，您将在程序中看到。 First the program generates all of the 48 choose 12 combinations, and then all the way up to 48 choose 19. 首先程序生成所有48个选择12个组合，然后一直到48个选择19。

The problem is that the total number of combinations is 65 trillion, which is not possible to compute in a reasonable amount of time. 问题是组合的总数是65万亿，这在合理的时间内无法计算。 I thought, "Ok, well I will just write every billionth one to the file." 我想，“好吧，好吧，我将把每十亿分之一写入文件。” Well, that will also take a ton of time, because the program still has to count to 65 trillion, even if it only writes every billionth combination. 那么，这也需要花费大量的时间，因为该计划仍然需要达到65万亿，即使它只写了每十亿个组合。

Is there anything I could modify in my program to avoid this having to count to an extraordinary large number, but still write every billionth combination to a file? 有什么我可以在我的程序中修改，以避免这必须计入一个非常大的数字，但仍然写入每十亿个组合到一个文件？

#include <iostream>
#include <string>
#include <iostream>
#include <fstream>
#include <vector>

using namespace std;

template <typename Iterator>
bool next_combination(const Iterator first, Iterator k, const Iterator last)
{
   if ((first == last) || (first == k) || (last == k))
      return false;
   Iterator i1 = first;
   Iterator i2 = last;
   ++i1;
   if (last == i1)
      return false;
   i1 = last;
   --i1;
   i1 = k;
   --i2;
   while (first != i1)
   {
      if (*--i1 < *i2)
      {
         Iterator j = k;
         while (!(*i1 < *j)) ++j;
         std::iter_swap(i1,j);
         ++i1;
         ++j;
         i2 = k;
         std::rotate(i1,j,last);
         while (last != j)
         {
            ++j;
            ++i2;
         }
         std::rotate(k,i2,last);
         return true;
      }
   }
   std::rotate(first,k,last);
   return false;
}

unsigned long long count = 0;

int main()
{
  ofstream myfile;
  myfile.open ("m = 8.txt");

  string s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnop";

  for (int i = 12; i <= 19; i++)
  {
    std::size_t comb_size = i;

    do
    { 
      if (count == 0)
        myfile << std::string(s.begin(),s.begin() + comb_size) << std::endl;

      if (++count % 1000000000 == 0)
        myfile << std::string(s.begin(),s.begin() + comb_size) << std::endl;

    }while(next_combination(s.begin(),s.begin()+ comb_size,s.end()));
  }

  myfile.close();

  cout << "Done!" << endl;

  system("PAUSE");
  return 0;
}

Answer 1

I've got a simple transformation to use a different library which is about 36X faster than yours. 我有一个简单的转换，使用不同的库，比你的快36倍。 It is still brute force. 它仍然是蛮力。 But while on my machine I'm estimating your code will take 418 days to complete, my code will take only about 3.65 days. 但是在我的机器上，我估计你的代码需要418天才能完成，我的代码只需要大约3.65天。 Still outrageously long. 仍然很蛮长。 But it gets it down to a long weekend. 但它可以归结为一个漫长的周末。

Here's my code: 这是我的代码：

#include <iostream>
#include <string>
#include <fstream>
#include "../combinations/combinations"

using namespace std;

unsigned long long count = 0;

int main()
{
  ofstream myfile("m = 8.txt");

  string s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnop";

  for (int i = 12; i <= 19; i++)
     for_each_combination(s.begin(), s.begin() + i, s.end(),
        [&](std::string::const_iterator f, std::string::const_iterator l) -> bool
        {
          if (::count++ % 1000000000 == 0)
            myfile << std::string(f, l) << std::endl;
          return false;
        });

  myfile.close();

  cout << "Done!" << endl;
  return 0;
}

Cutting the number of tests on count in the inner loop was a 15% performance increase. 在内循环中减少count测试count是性能提高15％。

"../combinations/combinations" refers to this library: “../combinations/combinations”指的是这个库：

http://howardhinnant.github.io/combinations.html http://howardhinnant.github.io/combinations.html

The link includes a description and full source code. 该链接包括说明和完整源代码。

This test program can also easily be modified to count the total number of combinations: 也可以轻松修改此测试程序以计算组合总数：

#include <iostream>
#include <string>
#include "../combinations/combinations"

using namespace std;


int main()
{
  string s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnop";
  unsigned long long count = 0;
  for (int i = 12; i <= 19; i++)
     count += count_each_combination(s.begin(), s.begin() + i, s.end());

  cout << "Done! " << count << endl;
  return 0;
}

which outputs: 哪个输出：

Done! 27189132782091

The code is open source with a boost license (it is not part of the boost library). 代码是带有boost许可证的开源代码（它不是boost库的一部分）。 Feel free to use it. 随意使用它。

Answer 2

This is the code I wrote before for finding the kth permutation of a given string. 这是我之前为找到给定字符串的第k个排列而编写的代码。 I think my idea is similar to @Tarik that we don't need to list all the permutations before the kth. 我认为我的想法与@Tarik类似，我们不需要在第k个之前列出所有排列。

string getPermutation(string s, int k) {
    string res;
    int n = s.size();
    int total = 1, digits = n - 1;
    for (int i = 1; i < n; ++i)
        total *= i;
    while (res.size() < n)
    {
        int i = 0;
        for (int m = 1; m < (int) ceil(k * 1.0 / total); ++m)
            i++;
        res += s[i];
        s.erase(s.begin() + i); // erase from string is not a good idea:)
        k = (k % total == 0) ? total : k % total;
        total = (total == 1) ? 1 : total / digits--;
    }
    return res;
}

It works well for short string. 它适用于短串。 For example getPermutation("12345", 37) will return 24135 . 例如， getPermutation("12345", 37)将返回24135 。

But for your string s with length 48 , the variable total will overflow even with type long long . 但是，对于您的字符串s长度为48 ，变量total甚至会与类型的溢出long long 。 So we need to do extra work handling this. 所以我们需要做额外的工作处理。

My code is somewhat hard to understand:) 我的代码有点难以理解:) ~~You may improve on my code. 你可以改进我的代码。~~ I hope this will help you. 我希望这能帮到您。

UPDADE: I realize that what you need is combination not permutation . UPDADE：我意识到你需要的是组合而不是排列。 I totally went wrong! 我完全错了！ Forget my code:) 忘了我的代码:)

Answer 3

From http://en.wikipedia.org/wiki/Combinadic there is an algorithm to compute directly the k-th combination. 来自http://en.wikipedia.org/wiki/Combinadic的算法可直接计算第k个组合。 You need first to store Pascal's triangle. 首先需要存储Pascal的三角形。 If you need some code example, you can have a look at (Python language) https://github.com/sagemath/sagelib/blob/master/sage/combinat/choose_nk.py . 如果您需要一些代码示例，可以查看（Python语言） https://github.com/sagemath/sagelib/blob/master/sage/combinat/choose_nk.py 。

Answer 4

You can use a bitvector to speed up some of the computations, adapted from the bit-twiddling pages of Chess Programming Wiki. 您可以使用位向量来加速一些计算，这些计算是从国际象棋程序维基的苦恼页面改编而来的。

#include <iostream>
#include <iomanip>
#include <cstdint>

using U64 = uint64_t;

// generate the next integer with the same number of bits as c
U64 next_combination(U64 c) 
{
    auto const smallest = c & -c;
    auto const ripple = c + smallest;
    auto ones = c ^ ripple;
    ones = (ones >> 2) / smallest;
    return ripple | ones;
}

// generate all integers with k of the first n bits set
template<class Function>
void for_each_combination(std::size_t n, std::size_t k, Function fun)
{
    U64 y;
    auto const n_mask = (1ULL << n) - 1; // mask with all n bits set to 1
    auto const k_mask = (1ULL << k) - 1; // mask with first k bits set to 1

    auto x = k_mask; fun(x);
    for (; (y = next_combination(x) & n_mask) > x; x = y) fun(y);
}

int main() 
{
    auto const million = 1000000ULL;
    auto count = U64 { 0 };
    for (auto i = 12; i < 20; ++i) {
        for_each_combination(48, i, [&](U64 c) {
        /*if (count++ & million == 0) std::cout << std::dec << std::setfill(' ') << std::setw(8) << (count - 1) / million << ": " << std::hex << std::showbase << std::setfill('0') << std::setw(16) << c << "\n";*/
            ++count;
        });
    }
    std::cout << count << "\n";
}

On a single core inside a virtualbox of my Xeon E5-1650 @3.2 Ghz, my best estimate is that it will take 3.52 days to increment the counter 2.7e13 times (not generating the output itself). 在我的Xeon E5-1650 @ 3.2 Ghz的虚拟盒内的单个核心上，我最好的估计是需要3.52天来增加计数器2.7e13次（不产生输出本身）。 It only works for subsets B(n, k) with n < 64, unless you use some 128-bit integer class. 它仅适用于n <64的子集B（n，k），除非您使用某些128位整数类。

Given a bitvector that has k out of n bits set to 1, it is a straightforward matter to map that to the original sequence of chars or any other type and print whatever combination is required. 给定比特向量将n比特中的k设置为1，将其映射到原始的字符序列或任何其他类型并打印所需的任何组合是一件简单的事情。 For sequences that do not have random iterator access, it is of course more expensive than Howard Hinnant's approach. 对于没有随机迭代器访问的序列，它当然比Howard Hinnant的方法更昂贵。

Answer 5

如果你不关心实际的计数是什么，你使用一个32位的int，它仍然会让你知道你达到了10亿。

我的程序编写每十亿个组合的更有效方法是什么？

问题描述

5 个解决方案

解决方案1
4 2013-07-15 21:40:06

解决方案2
2 2013-07-15 15:35:50

解决方案3
1 2013-07-16 18:24:44

解决方案4
1 2013-07-17 20:40:07

解决方案5
-1 2013-07-15 15:16:48

我的程序编写每十亿个组合的更有效方法是什么？

问题描述

5 个解决方案

解决方案1 4 2013-07-15 21:40:06

解决方案2 2 2013-07-15 15:35:50

解决方案3 1 2013-07-16 18:24:44

解决方案4 1 2013-07-17 20:40:07

解决方案5 -1 2013-07-15 15:16:48

解决方案1
4 2013-07-15 21:40:06

解决方案2
2 2013-07-15 15:35:50

解决方案3
1 2013-07-16 18:24:44

解决方案4
1 2013-07-17 20:40:07

解决方案5
-1 2013-07-15 15:16:48