简体   繁体   中英

Parallel algorithm to produce all possible sequences of a set

I'm having difficult time while trying to produce all possible strings of a given character set. Let S be set of symbols. I need to process all possible combinations of S in length n . For example if S={'a','b','+','-'} and n=4 the algorithm should process following sequences:

aaaa
aaab
abab
+aa-
// And all other sequences in the universe

Currently my algorithm is a non-efficient recursive algorithm described below. I have two question:

  1. Is there a faster algorithm?
  2. Is there a parallel algorithm to solve this problem?

Current implementation: (Simplified)

void workhorse(vector<char> &input, vector<char>::iterator i)
{
    if(i==inputs.end()) {
        // process the input
        return;
    }
    else {
        for( const auto& symbol : S) {
            *i=symbol;
            workhorse(input, i+1);
        }
    }
}   

Do you need all permutations or combinations? From your example it seems like combinations (order doesn't matter, symbols can be repeated) but in your code it looks like you may be trying to generate permutations (guessing by the function name). Combinations is a simple matter of counting in base n - in this case 4^4 and permutations would be the much fewer 4! but slightly more complicated recursive algorithm depending if you want to preserve lexicographic order. Either way the algorithms are among the basic pillars of computer science and well covered already, try these other Q's:

Generating all Possible Combinations

Generate list of all possible permutations of a string

Your algorithm already looks pretty efficient, you're not wasting any work. The only thing that could probably be improved slightly is the function call overhead due to recursion. But recursion is good, because it allows for easy parallelization:

#include <thread>
#include <array>
#include <string>
#include <vector>
using namespace std;

array<char,3> S = {{ 'a', 'b', 'c' }};
const int split_depth = 2;
void workhorse(string& result, int i) {
    if (i == result.size()) {
        // process the input
        return;
    }
    if (i == split_depth) {
        vector<thread> threads;
        for (char symbol : S) {
            result[i] = symbol;
            threads.emplace_back([=] {
                string cpy(result);
                workhorse(cpy, i + 1);
            });
        }
        for (thread& t: threads) t.join();
    } else {
        for (char symbol : S) {
            result[i] = symbol;
            workhorse(result, i + 1);
        }
    }
}

int main() {
    string res(6, 0);
    workhorse(res, 0);
}

Make sure to compile it with C++11 features and threading enabled, for example

$ g++ -O3 -std=c++11 -lpthread [file].cpp

This version of the function will enumerate all prefixes of up to length split_depth sequentially and then spawn a thread to further process each of those. So it will start |S|^split_depth threads in total, which you can adapt to match your hardware concurrency.

You could just do it iteratively. But it won't be much faster.

Imagine that characters in the set are digits.

'a' = 0, 'b' = 1, '+' = 2, '-' = 3

you start from 0000 and then increment it until you reach 3333.

0000, 0001, 0002, 0003, 0010, 0011, and so on...

This can be easily parallelised. For two threads let the first one do the job from 0000 to 1333 and the other from 2000 to 3333. Obviously this can be easily extended to an arbitrary number of threads.

There is nothing more to do. If your program is slow that's because there are so many combinations to choose. The time needed to find all the combinations by this code linearly depends on the number of combinations that there are.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM