简体   繁体   中英

How to generate a sequence of integers with predefined *uniqueness*?

For some testing I need to generate a potentially long non-random sequence of integers with predefined uniqueness . I define the uniqueness as a floating number, equal to the "number of unique numbers in the sequence" divided by the "total sequence length". This number should be in the (0, 1] half-open interval.

I might need these sequences with different length, which is unknown in advance - so I need an algorithm to generate such a sequence, for which its any prefix sequence has uniqueness, closed to the predefined one. For example, the sequence 1,2,...,m,1,2,...,n with uniqueness max(m,n)/(m+n) is not good for me.

The problem looks simple, because the algorithm should generate just a single sequence - but the function next() which I wrote (see below) looks unexpectedly complex, and it also uses the core memory a lot:

typedef std::set<uint64_t> USet;
typedef std::map<unsigned, USet> CMap;

const double uniq = 0.25;    // --- the predefined uniqueness 
uint64_t totalSize = 0;      // --- current sequence length
uint64_t uniqSize = 0;       // --- current number of unique integers
uint64_t last = 0;           // --- last added integer
CMap m;                      // --- all numbers, grouped by their cardinality  

uint64_t next()
{
  if (totalSize > 0)
  {
    const double uniqCurrent = static_cast<double>(uniqSize) / totalSize;
    if (uniqCurrent <= uniq)
    {
      // ------ increase uniqueness by adding a new number to the sequence 
      const uint64_t k = ++last;
      m[1].insert(k);
      ++totalSize;
      ++uniqSize;
      return k;
    }
    else
    {
      // ------ decrease uniqueness by repeating an already used number
      CMap::iterator mIt = m.begin();
      while (true)
      {
        assert(mIt != m.cend());
        if (mIt->second.size() > 0) break;
        ++mIt;
      }
      USet& s = mIt->second;
      const USet::iterator sIt = s.cbegin();
      const uint64_t k = *sIt;
      m[mIt->first + 1].insert(k);
      s.erase(sIt);
      ++totalSize;
      return k;
    }
  }
  else
  {
    m[1].insert(0);
    ++totalSize;
    ++uniqSize;
    return 0;
  }
}

Any ideas how to do that simpler?

You didn't say anything about trying to get each number to have the same cardinality. The code below does that approximately, but there are some cases where it chooses a number "out of turn" (mostly early in the sequence). Hope the simplicity and the constant space usage makes up for that.

#include <cassert>
#include <cstdio>

class Generator {
 public:
  explicit Generator(double uniqueness)
      : uniqueness_(uniqueness), count_(0), unique_count_(0),
        previous_non_unique_(0) {
    assert(uniqueness_ > 0.0);
  }

  int Next() {
    ++count_;
    if (static_cast<double>(unique_count_) / static_cast<double>(count_) <
        uniqueness_) {
      ++unique_count_;
      previous_non_unique_ = 1;
      return unique_count_;
    } else {
      --previous_non_unique_;
      if (previous_non_unique_ <= 0) {
        previous_non_unique_ = unique_count_;
      }
      return previous_non_unique_;
    }
  }

 private:
  const double uniqueness_;
  int count_;
  int unique_count_;
  int previous_non_unique_;
};

int main(void) {
  Generator generator(0.25);
  while (true) {
    std::printf("%d\n", generator.Next());
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM