當向量中唯一元素的數量遠小於向量大小時，有效地處理向量的每個唯一排列

Question

在程序中，我需要對向量的每個唯一排列並行應用一個函數。 向量的大小約為N = 15

我已經有一個函數void parallel_for_each_permutation ，可以將其與std :: set結合使用，以僅對每個唯一排列進行一次處理。

對於一般情況，這一切都很好。 但是，在我的用例中，每個向量的唯一元素k的數量非常有限，通常約為k = 4。 這意味着我目前正在浪費時間一遍又一遍地構造相同的唯一排列，只是因為已經被處理而將其丟棄。

在這種特殊情況下，是否可以處理所有唯一的排列，而無需構造所有N！ 排列？

用例示例：

#include <algorithm>
#include <thread>
#include <vector>
#include <mutex>
#include <numeric>
#include <set>
#include <iostream>

template<class Container1, class Container2>
struct Comp{
    //compare element-wise less than
    bool operator()(const Container1& l, const Container2& r) const{
        auto pair = std::mismatch(l.begin(), l.end(), r.begin());
        if(pair.first == l.end() && pair.second == r.end())
            return false;
        return *(pair.first) < *(pair.second);
    }
};

template<class Container, class Func>
void parallel_for_each_permutation(const Container& container, int num_threads, Func func){
    auto ithPermutation = [](int n, size_t i) -> std::vector<size_t>{
        // https://stackoverflow.com/questions/7918806/finding-n-th-permutation-without-computing-others
        std::vector<size_t> fact(n);
        std::vector<size_t> perm(n);

        fact[0] = 1;
        for(int k = 1; k < n; k++)
            fact[k] = fact[k-1] * k;

        for(int k = 0; k < n; k++){
            perm[k] = i / fact[n-1-k];
            i = i % fact[n-1-k];
        }

        for(int k = n-1; k > 0; k--){
            for(int j = k-1; j >= 0; j--){
                if(perm[j] <= perm[k])
                    perm[k]++;
            }
        }

        return perm;
    };

    size_t totalNumPermutations = 1;
    for(size_t i = 1; i <= container.size(); i++)
        totalNumPermutations *= i;

    std::vector<std::thread> threads;

    for(int threadId = 0; threadId < num_threads; threadId++){
        threads.emplace_back([&, threadId](){
            const size_t firstPerm = size_t(float(threadId) * totalNumPermutations / num_threads);
            const size_t last_excl = std::min(totalNumPermutations, size_t(float(threadId+1) * totalNumPermutations / num_threads));

            Container permutation(container);

            auto permIndices = ithPermutation(container.size(), firstPerm);

            size_t count = firstPerm;
            do{
                for(int i = 0; i < int(permIndices.size()); i++){
                    permutation[i] = container[permIndices[i]];
                }

                func(threadId, permutation);
                std::next_permutation(permIndices.begin(), permIndices.end());
                ++count;
            }while(count < last_excl);
        });
    }

    for(auto& thread : threads)
        thread.join();
}

template<class Container, class Func>
void parallel_for_each_unique_permutation(const Container& container, Func func){
    using Comparator = Comp<Container, Container>;
    constexpr int numThreads = 4;

    std::set<Container, Comparator> uniqueProcessedPermutations(Comparator{});
    std::mutex m;

    parallel_for_each_permutation(
        container,
        numThreads,
        [&](int threadId, const auto& permutation){

            {
                std::lock_guard<std::mutex> lg(m);
                if(uniqueProcessedPermutations.count(permutation) > 0){
                    return;
                }else{
                    uniqueProcessedPermutations.insert(permutation);
                }
            }

            func(permutation);
        }
    );
}

int main(){
    std::vector<int> vector1{1,1,1,1,2,3,2,2,3,3,1};

    auto func = [](const auto& vec){return;};

    parallel_for_each_unique_permutation(vector1, func);
}

Answer 1

在組合技術領域中，必須使用的排列稱為多集排列 。

例如，高岡忠雄教授在《組合對象服務器》中對它們進行了詳細說明。

FXT開源庫中有一些相關的Python代碼和一些C ++代碼。

您可以考慮將“ multiset”和“ combinatorics”標簽添加到您的問題中。

一種可能性是從FXT庫借用（僅標頭）算法代碼，該庫為這些多集排列提供了一個簡單的生成器類。

性能水平：

使用FXT算法對15個對象的測試向量{1,1,1, 2,2,2, 3,3,3,3, 4,4,4,4,4} ，可以生成所有關聯的12,612,600在普通的普通Intel x86-64計算機上，在不到2秒的時間內“排列”； 這沒有診斷文本I / O，也沒有任何優化嘗試。

該算法僅生成所需的“排列”，僅此而已。 因此，不再需要生成全部15個！ “原始”排列也不使用互斥來更新共享數據結構以進行過濾。

用於生成排列的適配器類：

我將在下面嘗試提供適配器類的代碼，該類允許您的應用程序在將依賴項包含在單個實現文件中的同時使用FXT算法。 這樣，代碼有望更好地適合您的應用程序。 考慮FXT的ulong類型和原始指針的使用，而不是代碼中的std::vector<std::size_t> 。 此外，FXT是一個非常廣泛的庫。

“適配器”類的頭文件：

// File:  MSetPermGen.h

#ifndef  MSET_PERM_GEN_H
#define  MSET_PERM_GEN_H

#include  <iostream>
#include  <vector>

class MSetPermGenImpl;  // from algorithmic backend

using  IntVec  = std::vector<int>;
using  SizeVec = std::vector<std::size_t>;

// Generator class for multiset permutations:

class MSetPermGen {
public:
    MSetPermGen(const IntVec& vec);

    std::size_t       getCycleLength() const;
    bool              forward(size_t incr);
    bool              next();
    const SizeVec&    getPermIndices() const;
    const IntVec&     getItems() const;
    const IntVec&     getItemValues() const;

private: 
    std::size_t       cycleLength_;
    MSetPermGenImpl*  genImpl_;         // implementation generator
    IntVec            itemValues_;      // only once each
    IntVec            items_;           // copy of ctor argument
    SizeVec           freqs_;           // repetition counts
    SizeVec           state_;           // array of indices in 0..n-1
};

#endif

類的構造函數完全采用主程序中提供的參數類型。 當然，關鍵方法是next() 。 您也可以使用forward(incr)方法一次將自動機移動幾步。

客戶端程序示例：

// File:  test_main.cpp

#include  <cassert>
#include  "MSetPermGen.h"

using  std::cout;
using  std::cerr;
using  std::endl;

// utility functions:

std::vector<int>  getMSPermutation(const MSetPermGen& mspg)
{
    std::vector<int>  res;
    auto indices = mspg.getPermIndices();  // always between 0 and n-1
    auto values  = mspg.getItemValues();  // whatever the user put in

    std::size_t n = indices.size();
    assert( n == items.size() );
    res.reserve(n);

    for (std::size_t i=0; i < n; i++) {
        auto xi = indices[i];
        res.push_back(values[xi]);
    }

    return res;
}

void printPermutation(const std::vector<int>& p, std::ostream& fh)
{
    std::size_t n = p.size();

    for (size_t i=0; i < n; i++)
        fh << p[i] << " ";
    fh << '\n';
}

int main(int argc, const char* argv[])
{
    std::vector<int>  vec0{1,1, 2,2,2};                        // N=5
    std::vector<int>  vec1{1,1, 1,1, 2, 3, 2,2, 3,3, 1};       // N=11
    std::vector<int>  vec2{1,1,1, 2,2,2, 3,3,3,3, 4,4,4,4,4};  // N=15

    MSetPermGen  pg0{vec0};
    MSetPermGen  pg1{vec1};
    MSetPermGen  pg2{vec2};

    auto pg = &pg0;  // choice of 0, 1, 2 for sizing
    auto cl = pg->getCycleLength();

    auto permA = getMSPermutation(*pg);
    printPermutation(permA, cout);
    for (std::size_t pi=0; pi < (cl-1); pi++) {
        pg->next();
        auto permB = getMSPermutation(*pg);
        printPermutation(permB, cout);
    }

    return EXIT_SUCCESS;
}

以上小程序的文本輸出：

您只能從向量{1,1,2,2,2}中獲得10個項目，因為5！ /（2！* 3！）= 120 /（2 * 6）= 10。

適配器類MSetPermGen.cpp的實現文件由兩部分組成。 第一部分是具有最小適應性的FXT代碼。 第二部分是MSetPermGen類本身。

實施文件的第一部分：

// File:  MSetPermGen.cpp - part 1 of 2 - FXT code

// -------------- Beginning  of header-only FXT combinatorics code -----------

 // This file is part of the FXT library.
 // Copyright (C) 2010, 2012, 2014 Joerg Arndt
 // License: GNU General Public License version 3 or later,
 // see the file COPYING.txt in the main directory.

//--  https://www.jjj.de/fxt/ 
//--  https://fossies.org/dox/fxt-2018.07.03/mset-perm-lex_8h_source.html

#include  <cstddef>
using ulong = std::size_t;

inline void  swap2(ulong& xa, ulong& xb)
{
    ulong  save_xb = xb;

    xb = xa;
    xa = save_xb;
}

class mset_perm_lex
 // Multiset permutations in lexicographic order, iterative algorithm.
 {
 public:
     ulong k_;    // number of different sorts of objects
     ulong *r_;   // number of elements '0' in r[0], '1' in r[1], ..., 'k-1' in r[k-1]
     ulong n_;    // number of objects
     ulong *ms_;  // multiset data in ms[0], ..., ms[n-1], sentinels at [-1] and [-2]

 private:  // have pointer data
     mset_perm_lex(const mset_perm_lex&);  // forbidden
     mset_perm_lex & operator = (const mset_perm_lex&);  // forbidden

 public:
     explicit mset_perm_lex(const ulong *r, ulong k)
     {
         k_ = k;
         r_ = new ulong[k];
         for (ulong j=0; j<k_; ++j)  r_[j] = r[j];  // get buckets

         n_ = 0;
         for (ulong j=0; j<k_; ++j)  n_ += r_[j];
         ms_ = new ulong[n_+2];
         ms_[0] = 0; ms_[1] = 1;  // sentinels:  ms[0] < ms[1]
         ms_ += 2;  // nota bene

         first();
     }

     void first()
     {
         for (ulong j=0, i=0;  j<k_;  ++j)
             for (ulong h=r_[j];  h!=0;  --h, ++i)
                 ms_[i] = j;
     }

     ~mset_perm_lex()
     {
         ms_ -= 2;
         delete [] ms_;
         delete [] r_;
     }

     const ulong * data()  const { return ms_; }

     ulong next()
     // Return position of leftmost change,
     // return n with last permutation.
     {
         // find rightmost pair with ms[i] < ms[i+1]:
         const ulong n1 = n_ - 1;
         ulong i = n1;
         do  { --i; }  while ( ms_[i] >= ms_[i+1] );  // can read sentinel
         if ( (long)i < 0 )  return n_;  // last sequence is falling seq.

         // find rightmost element ms[j] less than ms[i]:
         ulong j = n1;
         while ( ms_[i] >= ms_[j] )  { --j; }

         swap2(ms_[i], ms_[j]);

         // Here the elements ms[i+1], ..., ms[n-1] are a falling sequence.
         // Reverse order to the right:
         ulong r = n1;
         ulong s = i + 1;
         while ( r > s )  { swap2(ms_[r], ms_[s]);  --r;  ++s; }

         return i;
     } 
 };

// -------------- End of header-only FXT combinatorics code -----------

類實現文件的第二部分：

// Second part of file MSetPermGen.cpp: non-FXT code

#include  <cassert>
#include  <tuple>
#include  <map>
#include  <iostream>
#include  <cstdio>

#include  "MSetPermGen.h"

using  std::cout;
using  std::cerr;
using  std::endl;

class MSetPermGenImpl {  // wrapper class
public:
    MSetPermGenImpl(const SizeVec& freqs) : fg(freqs.data(), freqs.size())
    {}
private:
    mset_perm_lex   fg;

    friend class MSetPermGen;
};

static std::size_t  fact(size_t n)
{
    std::size_t  f = 1;

    for (std::size_t i = 1; i <= n; i++)
        f = f*i;
    return f;
}

MSetPermGen::MSetPermGen(const IntVec& vec) : items_(vec)
{
    std::map<int,int>  ma;

    for (int i: vec) {
        ma[i]++;
    }
    int item, freq;
    for (const auto& p : ma) {
       std::tie(item, freq) = p;
       itemValues_.push_back(item);
       freqs_.push_back(freq);
    }
    cycleLength_ = fact(items_.size());
    for (auto i: freqs_)
        cycleLength_ /= fact(i);

    // create FXT-level generator:
    genImpl_ = new MSetPermGenImpl(freqs_);
    for (std::size_t i=0; i < items_.size(); i++)
        state_.push_back(genImpl_->fg.ms_[i]);
}

std::size_t  MSetPermGen::getCycleLength() const
{
    return cycleLength_;
}

bool  MSetPermGen::forward(size_t incr)
{
    std::size_t  n  = items_.size();
    std::size_t  rc = 0;

    // move forward state by brute force, could be improved:
    for (std::size_t i=0; i < incr; i++) 
        rc = genImpl_->fg.next();

    for (std::size_t j=0; j < n; j++)
        state_[j] = genImpl_->fg.ms_[j];
    return (rc != n);
}

bool  MSetPermGen::next()
{
    return forward(1);
}

const SizeVec&  MSetPermGen::getPermIndices() const
{
    return (this->state_);
}

const IntVec&  MSetPermGen::getItems() const
{
    return (this->items_);
}

const IntVec&  MSetPermGen::getItemValues() const
{
    return (this->itemValues_);
}

調整並行應用程序：

對於多線程應用程序，考慮到生成“排列”很便宜，因此您可以負擔每個線程創建一個生成器對象。

在啟動實際計算之前，將每個生成器轉發到其適當的初始位置，即步驟thread_id * (cycleLength / num_threads) 。

我已經嘗試按照以下方式將您的代碼適應此MSetPermGen類。 請參見下面的代碼。

具有3個線程，大小為15（給出12,612,600個排列）的輸入向量{1,1,1, 2,2,2, 3,3,3,3, 4,4,4,4,4}和所有診斷功能已啟用，修改后的並行程序將在不到10秒的時間內運行； 在關閉所有診斷后不到2秒的時間。

修改后的並行程序：

#include  <algorithm>
#include  <thread>
#include  <vector>
#include  <atomic>
#include  <mutex>
#include  <numeric>
#include  <set>
#include  <iostream>
#include  <fstream>
#include  <sstream>
#include  <cstdlib>

#include  "MSetPermGen.h"

using  std::cout;
using  std::endl;

// debug and instrumentation:
static std::atomic<size_t>  permCounter;
static bool doManagePermCounter = true;
static bool doThreadLogfiles    = true;
static bool doLogfileHeaders    = true;

template<class Container, class Func>
void parallel_for_each_permutation(const Container& container, int numThreads, Func mfunc) {

    MSetPermGen  gen0(container);
    std::size_t totalNumPermutations = gen0.getCycleLength();
    std::size_t permShare = totalNumPermutations / numThreads;
    if ((totalNumPermutations % numThreads) != 0)
        permShare++;
    std::cout << "totalNumPermutations: " << totalNumPermutations << std::endl;

    std::vector<std::thread>  threads;

    for (int threadId = 0; threadId < numThreads; threadId++) {
        threads.emplace_back([&, threadId]() {

            // generate some per-thread logfile name
            std::ostringstream  fnss;
            fnss << "thrlog_" << threadId << ".txt";
            std::string    fileName = fnss.str();
            std::ofstream  fh(fileName);

            MSetPermGen  thrGen(container);
            const std::size_t firstPerm = permShare * threadId;
            thrGen.forward(firstPerm);

            const std::size_t last_excl = std::min(totalNumPermutations,
                                             (threadId+1) * permShare);

            if (doLogfileHeaders) {
                fh << "MSG threadId: "  << threadId << '\n';
                fh << "MSG firstPerm: " << firstPerm << '\n';
                fh << "MSG lastExcl : " << last_excl << '\n';
            }

            Container permutation(container);            
            auto values      = thrGen.getItemValues();
            auto permIndices = thrGen.getPermIndices();
            auto nsz         = permIndices.size();

            std::size_t count = firstPerm;
            do {
                for (std::size_t i = 0; i < nsz; i++) {
                    permutation[i] = values[permIndices[i]];
                }

                mfunc(threadId, permutation);

                if (doThreadLogfiles) {
                    for (std::size_t i = 0; i < nsz; i++)
                        fh << permutation[i] << ' ';
                    fh << '\n';
                }
                thrGen.next();
                permIndices = thrGen.getPermIndices();
                ++count;
                if (doManagePermCounter) {
                    permCounter++;
                }
            } while (count < last_excl);

            fh.close();
        });
    }

    for(auto& thread : threads)
        thread.join();
}

template<class Container, class Func>
void parallel_for_each_unique_permutation(const Container& container, Func func) {
    constexpr int numThreads = 3;

    parallel_for_each_permutation(
        container,
        numThreads,
        [&](int threadId, const auto& permutation){
            // no longer need any mutual exclusion
            func(permutation);
        }
    );
}


int main()
{
    std::vector<int>  vector1{1,1,1,1,2,3,2,2,3,3,1};             // N=11
    std::vector<int>  vector0{1,1, 2,2,2};                        // N=5
    std::vector<int>  vector2{1,1,1, 2,2,2, 3,3,3,3, 4,4,4,4,4};  // N=15

    auto func = [](const auto& vec) { return; };

    permCounter.store(0);

    parallel_for_each_unique_permutation(vector2, func);

    auto finalPermCounter = permCounter.load();
    cout << "FinalPermCounter = " << finalPermCounter << endl;

}

Answer 2

是的， <algorithm> -header包含用於置換的通用算法，即std::next_permutation() ， std::prev_permutation()和std::is_permutation() 。

誠然，它們都不直接給您第i個排列，您必須對其進行迭代。

除非找到有效計算第i個排列的方法，否則請考慮使用相同的代碼進行分區：
將具有最稀有值的元素分離為k，然后將其余部分分離為k，然后使用該元素對工作進行分區。

當向量中唯一元素的數量遠小於向量大小時，有效地處理向量的每個唯一排列

問題描述

2 個解決方案

解決方案1
1 已采納 2019-08-17 23:03:44

性能水平：

用於生成排列的適配器類：

客戶端程序示例：

以上小程序的文本輸出：

實施文件的第一部分：

類實現文件的第二部分：

調整並行應用程序：

修改后的並行程序：

解決方案2
0 2019-08-03 15:49:48

當向量中唯一元素的數量遠小於向量大小時，有效地處理向量的每個唯一排列

問題描述

2 個解決方案

解決方案1 1 已采納 2019-08-17 23:03:44

性能水平：

用於生成排列的適配器類：

客戶端程序示例：

以上小程序的文本輸出：

實施文件的第一部分：

類實現文件的第二部分：

調整並行應用程序：

修改后的並行程序：

解決方案2 0 2019-08-03 15:49:48

解決方案1
1 已采納 2019-08-17 23:03:44

解決方案2
0 2019-08-03 15:49:48