How to (cheaply) calculate all possible length-r combinations of n possible elements

Question

What is the fastest way to calculate all possible length-r combinations of n possible elements without resorting to brute force techniques or anything that requires STL?

While working on an Apriori algorithm for my final project in my data structures class, I developed an interesting solution that uses bit-shifting and recursion, which i will share in an answer below for anyone who is interested. However, is this the fastest way of achieving this (without using any common libraries)?

I ask more out of curiosity than anything else, as the algorithm i currently have works just fine for my purposes.

Answer 1

Here is the algorithm that i developed to solve this problem. It currently just outputs each combination as a series of ones and zeros, but can be easily adapted to create data sets based on an array of possible elements.

void r_nCr(const unsigned int &startNum, const unsigned int &bitVal, const unsigned int &testNum) // Should be called with arguments (2^r)-1, 2^(r-1), 2^(n-1)
{
    unsigned int n = (startNum - bitVal) << 1;
    n += bitVal ? 1 : 0;

    for (unsigned int i = log2(testNum) + 1; i > 0; i--) // Prints combination as a series of 1s and 0s
        cout << (n >> (i - 1) & 1);
    cout << endl;

    if (!(n & testNum) && n != startNum)
        r_nCr(n, bitVal, testNum);

    if (bitVal && bitVal < testNum)
        r_nCr(startNum, bitVal >> 1, testNum);
}

How it works:

This function treats each combination of elements as a sequence of ones and zeros, which can then be expressed with respect to a set of possible elements (but is not in this particular example).

For example, the results of 3C2 (all combinations of length-2 from a set of 3 possible elements) can be expressed as 011, 110, and 101. If the set of all possible elements is {A, B, C}, then the results can be expressed with respect to this set as {B, C}, {A, B}, and {A, C}.

For this explanation, i will be calculating 5C3 (all length-3 combinations composed of 5 possible elements).

This function accepts 3 arguments, all of which are unsigned integers:

The first parameter is the smallest possible integer whose binary representation has a number of 1s equal to the length of the combinations we're creating. This is out starting value for generating combinations. For 5C3, this would be 00111b, or 7 in decimal.
The second parameter is the value of highest bit that is set to 1 in the starting number. This is the first bit that will be subtracted when creating the combinations. For 5C3, this is the third bit from the right, which has a value of 4.
The third parameter is the value of the nth bit from the right, where n is the number of possible elements that we are combining. This number will be bitwise-anded with the combinations we create to check whether the left-most bit of the combination is a 1 or a 0. For 5C3, we will use the 5th bit from the right, which is 10000b, or 16 in decimal.

Here are the actual steps that the function performs:

Calculate startNum - bitVal, bit-shift one space to the left, and add 1 if bitVal is not 0.

For the first iteration, the result should be the same as startNum. This is so that we can print out the first combination (which is equal to startNum) within the function so we don't have to do it manually ahead of time. The math for this operation occurs as follows:

00111 - 00100 = 00011    
00011 << 1 = 00110   
00110 + 1 = 00111

The result of the previous calculation is a new combination. Do something with this data.

We are going to be printing the result to the console. This is done using a for-loop whose variable starts out equal to the number of bits we are working with (calculated by taking log2 of the testNum and adding 1; log2(16) + 1 = 4 + 1 = 5) and ends at 0. Each iteration, we bit-shift right by i-1 and print the right-most bit by and-ing the result with 1. Here is the math:

i=5:
00111 >> 4 = 00000
00000 & 00001 = 0

i=4:
00111 >> 3 = 00000
00000 & 00001 = 0

i=3:
00111 >> 2 = 00001
00001 & 00001 = 1

i=2:
00111 >> 1 = 00011
00011 & 00001 = 1

i=1:
00111 >> 0 = 00111
00111 & 00001 = 1

output: 00111

If the left-most bit of n (the result of the calculation in step 1) is 0 and n is not equal to startNum, we recurse with n as the new startNum.

Obviously this will be skipped on the first iteration, as we have already shown that n is equal to startNum. This becomes important in subsequent iterations, which we will see later.

If bitVal is greater than 0 and less than testNum, recurse with the current iteration's original startNum as the first argument. Second argument is bitVal shifted right by 1 (same thing as integer division by 2).

We now recurse with the new bitVal set to the value of the next bit to the right of the current bitVal. This next bit is what will be subtracted in the next iteration.

Continue to recurse until bitVal becomes equal to zero.

Because bitVal is bit-shifted right by one in the second recursive call, we will eventually reach a point when bitVal equals 0. This algorithm expands as a tree, and when bitVal equals zero and the left-most bit is 1, we return to one layer up from our current position. Eventually, this cascades all the way back the the root.

In this example, the tree has 3 subtrees and 6 leaf nodes. I will now step through the first subtree, which consists of 1 root node and 3 leaf nodes.

We will start at the last line of the first iteration, which is

if (bitVal)
        r_nCr(startNum, bitVal >> 1, testNum);

So we now enter the second iteration with startNum=00111(7), bitVal = 00010(2), and testNum = 10000(16) (this number never changes).

Second Iteration

Step 1:

n = 00111 - 00010 = 00101 // Subtract bitVal
n = 00101 << 1 = 01010 // Shift left
n = 01010 + 1 = 01011 // bitVal is not 0, so add 1

Step 2: Print result.

Step 3: The left-most bit is 0 and n is not equal to startNum, so we recurse with n as the new startNum. We now enter the third iteration with startNum=01011(11), bitVal = 00010(2), and testNum = 10000(16).

Third Iteration

Step 1:

n = 01011 - 00010 = 01001 // Subtract bitVal
n = 01001 << 1 = 10010 // Shift left
n = 10010 + 1 = 10011 // bitVal is not 0, so add 1

Step 2: Print result.

Step 3: The left-most bit is 1, so do not recurse.

Step 4: bitVal is not 0, so recurse with bitVal shifted right by 1. We now enter the fourth iteration with startNum=01011(11), bitVal = 00001(1), and testNum = 10000(16).

Fourth Iteration

Step 1:

n = 01011 - 00001 = 01010 // Subtract bitVal
n = 01010 << 1 = 10100 // Shift left
n = 10100 + 1 = 10101 // bitVal is not 0, so add 1

Step 2: Print result.

Step 3: The left-most bit is 1, so do not recurse.

Step 4: bitVal is not 0, so recurse with bitVal shifted right by 1. We now enter the fifth iteration with startNum=01011(11), bitVal = 00000(0), and testNum = 10000(16).

Fifth Iteration

Step 1:

n = 01011 - 00000 = 01011 // Subtract bitVal
n = 01011 << 1 = 10110 // Shift left
n = 10110 + 0 = 10110 // bitVal is 0, so add 0
// Because bitVal = 0, nothing is subtracted or added; this step becomes just a straight bit-shift left by 1.

Step 2: Print result.

Step 3: The left-most bit is 1, so do not recurse.

Step 4: bitVal is 0, so do not recurse.

Return to Second Iteration

Step 4: bitVal is not 0, so recurse with bitVal shifted right by 1.

This will continue on until bitVal = 0 for the first level of the tree and we return to the first iteration, at which point we will return from the function entirely.

Here is a simple diagram showing the function's tree-like expansion: 该图显示了递归扩展

And here is a more complicated diagram showing the function's thread of execution: Diagrom显示执行线程

Here is an alternate version using bitwise-or in place of addition and bitwise-xor in place of subtraction:

void r_nCr(const unsigned int &startNum, const unsigned int &bitVal, const unsigned int &testNum) // Should be called with arguments (2^r)-1, 2^(r-1), 2^(n-1)
{
    unsigned int n = (startNum ^ bitVal) << 1;
    n |= (bitVal != 0);

    for (unsigned int i = log2(testNum) + 1; i > 0; i--) // Prints combination as a series of 1s and 0s
        cout << (n >> (i - 1) & 1);
    cout << endl;

    if (!(n & testNum) && n != startNum)
        r_nCr(n, bitVal, testNum);

    if (bitVal && bitVal < testNum)
        r_nCr(startNum, bitVal >> 1, testNum);
}

Answer 2

What about this?

#include <stdio.h>

#define SETSIZE 3
#define NELEMS  7

#define BYTETOBINARYPATTERN "%d%d%d%d%d%d%d%d"
#define BYTETOBINARY(byte)  \
    (byte & 0x80 ? 1 : 0), \
            (byte & 0x40 ? 1 : 0), \
            (byte & 0x20 ? 1 : 0), \
            (byte & 0x10 ? 1 : 0), \
            (byte & 0x08 ? 1 : 0), \
            (byte & 0x04 ? 1 : 0), \
            (byte & 0x02 ? 1 : 0), \
            (byte & 0x01 ? 1 : 0)

int main()
{
    unsigned long long x = (1 << SETSIZE) -1;
    unsigned long long N = (1 << NELEMS) -1;

    while(x < N)
    {
            printf ("x: "BYTETOBINARYPATTERN"\n", BYTETOBINARY(x));
            unsigned long long a = x & -x;
            unsigned long long y = x + a;
            x = ((y & -y) / a >> 1) + y - 1;
    }
};

It should print 7C3.

How to (cheaply) calculate all possible length-r combinations of n possible elements

Question

2 answers

solution1
9 ACCPTED 2014-12-05 03:51:08

How it works:

Here are the actual steps that the function performs:

solution2
2 2014-12-05 09:41:19

How to (cheaply) calculate all possible length-r combinations of n possible elements

Question

2 answers

solution1 9 ACCPTED 2014-12-05 03:51:08

How it works:

Here are the actual steps that the function performs:

solution2 2 2014-12-05 09:41:19

solution1
9 ACCPTED 2014-12-05 03:51:08

solution2
2 2014-12-05 09:41:19