简体   繁体   中英

How do I create a list of random numbers without duplicates?

I tried using random.randint(0, 100) , but some numbers were the same. Is there a method/module to create a list unique random numbers?

Note: The following code is based on an answer and has been added after the answer was posted. It is not a part of the question; it is the solution.

def getScores():
    # open files to read and write
    f1 = open("page.txt", "r");
    p1 = open("pgRes.txt", "a");

    gScores = [];
    bScores = [];
    yScores = [];

    # run 50 tests of 40 random queries to implement "bootstrapping" method 
    for i in range(50):
        # get 40 random queries from the 50
        lines = random.sample(f1.readlines(), 40);

This will return a list of 10 numbers selected from the range 0 to 99, without duplicates.

import random
random.sample(range(100), 10)

With reference to your specific code example, you probably want to read all the lines from the file once and then select random lines from the saved list in memory. For example:

all_lines = f1.readlines()
for i in range(50):
    lines = random.sample(all_lines, 40)

This way, you only need to actually read from the file once, before your loop. It's much more efficient to do this than to seek back to the start of the file and call f1.readlines() again for each loop iteration.

You can use the shuffle function from the random module like this:

import random

my_list = list(xrange(1,100)) # list of integers from 1 to 99
                              # adjust this boundaries to fit your needs
random.shuffle(my_list)
print my_list # <- List of unique random numbers

Note here that the shuffle method doesn't return any list as one may expect, it only shuffle the list passed by reference.

您可以首先创建一个从ab的数字列表,其中ab分别是列表中的最小和最大数字,然后使用Fisher-Yates算法或使用 Python 的random.shuffle方法对其进行洗牌。

Linear Congruential Pseudo-random Number Generator

O(1) Memory

O(k) Operations

This problem can be solved with a simple Linear Congruential Generator . This requires constant memory overhead (8 integers) and at most 2*(sequence length) computations.

All other solutions use more memory and more compute! If you only need a few random sequences, this method will be significantly cheaper. For ranges of size N , if you want to generate on the order of N unique k -sequences or more, I recommend the accepted solution using the builtin methods random.sample(range(N),k) as this has been optimized in python for speed.

Code

# Return a randomized "range" using a Linear Congruential Generator
# to produce the number sequence. Parameters are the same as for 
# python builtin "range".
#   Memory  -- storage for 8 integers, regardless of parameters.
#   Compute -- at most 2*"maximum" steps required to generate sequence.
#
def random_range(start, stop=None, step=None):
    import random, math
    # Set a default values the same way "range" does.
    if (stop == None): start, stop = 0, start
    if (step == None): step = 1
    # Use a mapping to convert a standard range into the desired range.
    mapping = lambda i: (i*step) + start
    # Compute the number of numbers in this range.
    maximum = (stop - start) // step
    # Seed range with a random integer.
    value = random.randint(0,maximum)
    # 
    # Construct an offset, multiplier, and modulus for a linear
    # congruential generator. These generators are cyclic and
    # non-repeating when they maintain the properties:
    # 
    #   1) "modulus" and "offset" are relatively prime.
    #   2) ["multiplier" - 1] is divisible by all prime factors of "modulus".
    #   3) ["multiplier" - 1] is divisible by 4 if "modulus" is divisible by 4.
    # 
    offset = random.randint(0,maximum) * 2 + 1      # Pick a random odd-valued offset.
    multiplier = 4*(maximum//4) + 1                 # Pick a multiplier 1 greater than a multiple of 4.
    modulus = int(2**math.ceil(math.log2(maximum))) # Pick a modulus just big enough to generate all numbers (power of 2).
    # Track how many random numbers have been returned.
    found = 0
    while found < maximum:
        # If this is a valid value, yield it in generator fashion.
        if value < maximum:
            found += 1
            yield mapping(value)
        # Calculate the next value in the sequence.
        value = (value*multiplier + offset) % modulus

Usage

The usage of this function "random_range" is the same as for any generator (like "range"). An example:

# Show off random range.
print()
for v in range(3,6):
    v = 2**v
    l = list(random_range(v))
    print("Need",v,"found",len(set(l)),"(min,max)",(min(l),max(l)))
    print("",l)
    print()

Sample Results

Required 8 cycles to generate a sequence of 8 values.
Need 8 found 8 (min,max) (0, 7)
 [1, 0, 7, 6, 5, 4, 3, 2]

Required 16 cycles to generate a sequence of 9 values.
Need 9 found 9 (min,max) (0, 8)
 [3, 5, 8, 7, 2, 6, 0, 1, 4]

Required 16 cycles to generate a sequence of 16 values.
Need 16 found 16 (min,max) (0, 15)
 [5, 14, 11, 8, 3, 2, 13, 1, 0, 6, 9, 4, 7, 12, 10, 15]

Required 32 cycles to generate a sequence of 17 values.
Need 17 found 17 (min,max) (0, 16)
 [12, 6, 16, 15, 10, 3, 14, 5, 11, 13, 0, 1, 4, 8, 7, 2, ...]

Required 32 cycles to generate a sequence of 32 values.
Need 32 found 32 (min,max) (0, 31)
 [19, 15, 1, 6, 10, 7, 0, 28, 23, 24, 31, 17, 22, 20, 9, ...]

Required 64 cycles to generate a sequence of 33 values.
Need 33 found 33 (min,max) (0, 32)
 [11, 13, 0, 8, 2, 9, 27, 6, 29, 16, 15, 10, 3, 14, 5, 24, ...]

The solution presented in this answer works, but it could become problematic with memory if the sample size is small, but the population is huge (eg random.sample(insanelyLargeNumber, 10) ).

To fix that, I would go with this:

answer = set()
sampleSize = 10
answerSize = 0

while answerSize < sampleSize:
    r = random.randint(0,100)
    if r not in answer:
        answerSize += 1
        answer.add(r)

# answer now contains 10 unique, random integers from 0.. 100

If you need to sample extremely large numbers, you cannot use range

random.sample(range(10000000000000000000000000000000), 10)

because it throws:

OverflowError: Python int too large to convert to C ssize_t

Also, if random.sample cannot produce the number of items you want due to the range being too small

 random.sample(range(2), 1000)

it throws:

 ValueError: Sample larger than population

This function resolves both problems:

import random

def random_sample(count, start, stop, step=1):
    def gen_random():
        while True:
            yield random.randrange(start, stop, step)

    def gen_n_unique(source, n):
        seen = set()
        seenadd = seen.add
        for i in (i for i in source() if i not in seen and not seenadd(i)):
            yield i
            if len(seen) == n:
                break

    return [i for i in gen_n_unique(gen_random,
                                    min(count, int(abs(stop - start) / abs(step))))]

Usage with extremely large numbers:

print('\n'.join(map(str, random_sample(10, 2, 10000000000000000000000000000000))))

Sample result:

7822019936001013053229712669368
6289033704329783896566642145909
2473484300603494430244265004275
5842266362922067540967510912174
6775107889200427514968714189847
9674137095837778645652621150351
9969632214348349234653730196586
1397846105816635294077965449171
3911263633583030536971422042360
9864578596169364050929858013943

Usage where the range is smaller than the number of requested items:

print(', '.join(map(str, random_sample(100000, 0, 3))))

Sample result:

2, 0, 1

It also works with with negative ranges and steps:

print(', '.join(map(str, random_sample(10, 10, -10, -2))))
print(', '.join(map(str, random_sample(10, 5, -5, -2))))

Sample results:

2, -8, 6, -2, -4, 0, 4, 10, -6, 8
-3, 1, 5, -1, 3

So, I realize this post is 6 years old, but there is another answer with (usually) better algorithmic performance, although less practical with larger overhead.

Other answers include the shuffle method and the 'try until valid' method using sets.

If we're randomly choosing K integers without replacement from the interval 0...N-1, then the shuffle method uses O(N) storage and O(N) operations, which is annoying if we are choosing small K from large N. The set method only uses O(K) storage, but has worst case O(inf) expected O(n*log(n)) for K close to N. (Imagine trying to randomly get the last number out of two allowed answers, having already selected 999998, for k=n-1=10^6).

So the set method is fine for K~1, and the shuffle method is fine for K~N. Both use expected >K RNG calls.

Another way; you can pretend to do the Fisher–Yates shuffle, and for every new random selection, perform a binary search operation on your already-selected elements to find the value that you would get if you were actually storing an array of all the elements you haven't chosen yet.

If your already-selected values are [2,4], and your random number generator spits out 2 in the interval (N - num_already_selected), then you pretend to select out of [0,1,3,5,6,...] by counting the values less than the answer that have already been selected. In this case, your third selected value would be 3. Then, in the next step, if your random number were 2 again , it would map to 5 (in the pretend list [0,1,5,6]), because the (potential index of 5 in the sorted list of already-selected values [2,3,4], which is 3) + 2 = 5.

So store the already-selected values in a balanced binary search tree, store the rank (number of values less than that value) at each node, select a random number R from the range (0... n-(number already chosen)). Then descend the tree as if searching, but your search value is R plus the rank of whatever node you are at. When you reach a leaf node, add the random number to the rank of that node, and insert the sum into the balanced binary tree.

Once you have K elements, read them off the tree into an array and shuffle (if order is important).

This takes O(K) storage, O(K*log(K)) performance, and exactly K randint calls.

Example implementation of random sampling (non-random final ordering but you can O(K) shuffle after), O(k) storage and O(k log^2(k)) performance (not O(k log(k)) because we can't custom-descend balanced binary trees for this implementation):

from sortedcontainers import SortedList


def sample(n, k):
    '''
    Return random k-length-subset of integers from 0 to n-1. Uses only O(k) 
    storage. Bounded k*log^2(k) worst case. K RNG calls. 
    '''
    ret = SortedList()
    for i in range(k):
        to_insert = random.randint(0, n-1 - len(ret))
        to_insert = binsearch_adding_rank(ret, to_insert)
        ret.add(to_insert)

    return ret

def binsearch_adding_rank(A, v):
    l, u = 0, len(A)-1
    m=0
    while l <= u:
        m = l+(u-l)//2
        if v + m >= A[m]:
            l = m+1
            m+=1 # We're binary searching for partitions, so if the last step was to the right then add one to account for offset because that's where our insert would be.
        elif v+m < A[m]:
            u = m-1
    return v+m

And to show validity:

If we were doing the fisher-yates shuffle, having already chosen [1,4,6,7,8,9,15,16], with random number 5, our yet-to-be-chosen array would look like [0,2,3,5,10,11,12,...], so element 5 is 11. Thus our binsearch-function should return 11, given 5 and [1,4,6,7,8,9,15,16]:

assert binsearch_adding_rank([1,4,6,7,8,9,15,16], 5) == 11

Inverse of [1,2,3] is [0,4,5,6,7,8,...], the 5th element of which is 8, so:

assert binsearch_adding_rank([1,2,3], 5) == 8

Inverse of [2,3,5] is [0,1,4,6,...], the 1st element of which is (still) 1, so:

assert binsearch_adding_rank([2,3,5], 1) == 1

Inverse is [0,6,7,8,...], 3rd element is 8, and:

assert binsearch_adding_rank([1,2,3,4,5,10], 3) == 8

And to test the overall function:

# Edge cases: 
assert sample(50, 0) == []
assert sample(50, 50) == list(range(0,50))

# Variance should be small and equal among possible values:
x = [0]*10
for i in range(10_000):
    for v in sample(10, 5):
        x[v] += 1
for v in x:
    assert abs(5_000 - v) < 250, v
del x

# Check for duplication: 

y = sample(1500, 1000)
assert len(frozenset(y)) == len(y)
del y

In practice, though, use the shuffle method for K ~> N/2 and the set method for K ~< N/2.

edit: Here's another way of doing it using recursion! O(k*log(n)) I think.

def divide_and_conquer_sample(n, k, l=0):
    u = n-1
    # Base cases:
    if k == 0:
        return []
    elif k == n-l:
        return list(range(l, n))
    elif k == 1:
        return [random.randint(l, u)]

    # Compute how many left and how many right:
    m = l + (u-l)//2
    k_right = 0
    k_left = 0
    for i in range(k):
        # Base probability: (# of available values in right interval) / (total available values)
        if random.random() <= (n-m - k_right)/(n-l-k_right-k_left):
            k_right += 1
        else:
            k_left += 1
    # Recur
    return divide_and_conquer_sample(n, k_right, m) + divide_and_conquer_sample(m, k_left, l)

If the list of N numbers from 1 to N is randomly generated, then yes, there is a possibility that some numbers may be repeated.

If you want a list of numbers from 1 to N in a random order, fill an array with integers from 1 to N, and then use a Fisher-Yates shuffle or Python's random.shuffle() .

Here is a very small function I made, hope this helps!

import random
numbers = list(range(0, 100))
random.shuffle(numbers)

The answer provided here works very well with respect to time as well as memory but a bit more complicated as it uses advanced python constructs such as yield. The simpler answer works well in practice but, the issue with that answer is that it may generate many spurious integers before actually constructing the required set. Try it out with populationSize = 1000, sampleSize = 999. In theory, there is a chance that it doesn't terminate.

The answer below addresses both issues, as it is deterministic and somewhat efficient though currently not as efficient as the other two.

def randomSample(populationSize, sampleSize):
  populationStr = str(populationSize)
  dTree, samples = {}, []
  for i in range(sampleSize):
    val, dTree = getElem(populationStr, dTree, '')
    samples.append(int(val))
  return samples, dTree

where the functions getElem, percolateUp are as defined below

import random

def getElem(populationStr, dTree, key):
  msd  = int(populationStr[0])
  if not key in dTree.keys():
    dTree[key] = range(msd + 1)
  idx = random.randint(0, len(dTree[key]) - 1)
  key = key +  str(dTree[key][idx])
  if len(populationStr) == 1:
    dTree[key[:-1]].pop(idx)
    return key, (percolateUp(dTree, key[:-1]))
  newPopulation = populationStr[1:]
  if int(key[-1]) != msd:
    newPopulation = str(10**(len(newPopulation)) - 1)
  return getElem(newPopulation, dTree, key)

def percolateUp(dTree, key):
  while (dTree[key] == []):
    dTree[key[:-1]].remove( int(key[-1]) )
    key = key[:-1]
  return dTree

Finally, the timing on average was about 15ms for a large value of n as shown below,

In [3]: n = 10000000000000000000000000000000

In [4]: %time l,t = randomSample(n, 5)
Wall time: 15 ms

In [5]: l
Out[5]:
[10000000000000000000000000000000L,
 5731058186417515132221063394952L,
 85813091721736310254927217189L,
 6349042316505875821781301073204L,
 2356846126709988590164624736328L]

A very simple function that also solves your problem

from random import randint

data = []

def unique_rand(inicial, limit, total):

        data = []

        i = 0

        while i < total:
            number = randint(inicial, limit)
            if number not in data:
                data.append(number)
                i += 1

        return data


data = unique_rand(1, 60, 6)

print(data)


"""

prints something like 

[34, 45, 2, 36, 25, 32]

"""

In order to obtain a program that generates a list of random values without duplicates that is deterministic, efficient and built with basic programming constructs consider the function extractSamples defined below,

def extractSamples(populationSize, sampleSize, intervalLst) :
    import random
    if (sampleSize > populationSize) :
        raise ValueError("sampleSize = "+str(sampleSize) +" > populationSize (= " + str(populationSize) + ")")
    samples = []
    while (len(samples) < sampleSize) :
        i = random.randint(0, (len(intervalLst)-1))
        (a,b) = intervalLst[i]
        sample = random.randint(a,b)
        if (a==b) :
            intervalLst.pop(i)
        elif (a == sample) : # shorten beginning of interval                                                                                                                                           
            intervalLst[i] = (sample+1, b)
        elif ( sample == b) : # shorten interval end                                                                                                                                                   
            intervalLst[i] = (a, sample - 1)
        else :
            intervalLst[i] = (a, sample - 1)
            intervalLst.append((sample+1, b))
        samples.append(sample)
    return samples

The basic idea is to keep track of intervals intervalLst for possible values from which to select our required elements from. This is deterministic in the sense that we are guaranteed to generate a sample within a fixed number of steps (solely dependent on populationSize and sampleSize ).

To use the above function to generate our required list,

In [3]: populationSize, sampleSize = 10**17, 10**5

In [4]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)])
CPU times: user 289 ms, sys: 9.96 ms, total: 299 ms
Wall time: 293 ms

We may also compare with an earlier solution (for a lower value of populationSize)

In [5]: populationSize, sampleSize = 10**8, 10**5

In [6]: %time lst = random.sample(range(populationSize), sampleSize)
CPU times: user 1.89 s, sys: 299 ms, total: 2.19 s
Wall time: 2.18 s

In [7]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)])
CPU times: user 449 ms, sys: 8.92 ms, total: 458 ms
Wall time: 442 ms

Note that I reduced populationSize value as it produces Memory Error for higher values when using the random.sample solution (also mentioned in previous answers here and here ). For above values, we can also observe that extractSamples outperforms the random.sample approach.

PS : Though the core approach is similar to my earlier answer , there are substantial modifications in implementation as well as approach alongwith improvement in clarity.

You can use Numpy library for quick answer as shown below -

Given code snippet lists down 6 unique numbers between the range of 0 to 5. You can adjust the parameters for your comfort.

import numpy as np
import random
a = np.linspace( 0, 5, 6 )
random.shuffle(a)
print(a)

Output

[ 2.  1.  5.  3.  4.  0.]

It doesn't put any constraints as we see in random.sample as referred here .

The problem with the set based approaches ("if random value in return values, try again") is that their runtime is undetermined due to collisions (which require another "try again" iteration), especially when a large amount of random values are returned from the range.

An alternative that isn't prone to this non-deterministic runtime is the following:

import bisect
import random

def fast_sample(low, high, num):
    """ Samples :param num: integer numbers in range of
        [:param low:, :param high:) without replacement
        by maintaining a list of ranges of values that
        are permitted.

        This list of ranges is used to map a random number
        of a contiguous a range (`r_n`) to a permissible
        number `r` (from `ranges`).
    """
    ranges = [high]
    high_ = high - 1
    while len(ranges) - 1 < num:
        # generate a random number from an ever decreasing
        # contiguous range (which we'll map to the true
        # random number).
        # consider an example with low=0, high=10,
        # part way through this loop with:
        #
        # ranges = [0, 2, 3, 7, 9, 10]
        #
        # r_n :-> r
        #   0 :-> 1
        #   1 :-> 4
        #   2 :-> 5
        #   3 :-> 6
        #   4 :-> 8
        r_n = random.randint(low, high_)
        range_index = bisect.bisect_left(ranges, r_n)
        r = r_n + range_index
        for i in xrange(range_index, len(ranges)):
            if ranges[i] <= r:
                # as many "gaps" we iterate over, as much
                # is the true random value (`r`) shifted.
                r = r_n + i + 1
            elif ranges[i] > r_n:
                break
        # mark `r` as another "gap" of the original
        # [low, high) range.
        ranges.insert(i, r)
        # Fewer values possible.
        high_ -= 1
    # `ranges` happens to contain the result.
    return ranges[:-1]

I found a quite faster way than having to use the range function (very slow), and without using random function from python (I don´t like the random built-in library because when you seed it, it repeats the pattern of the random numbers generator)

import numpy as np

nums = set(np.random.randint(low=0, high=100, size=150)) #generate some more for the duplicates
nums = list(nums)[:100]

This is quite fast.

import random nums=[random.randint(1,100) for i in range(10) if random.randint(1,100) not in nums] 这个怎么样?

One straightforward alternative is to use np.random.choice() as shown below

np.random.choice(range(10), size=3, replace=False) 

This results in three integer numbers that are different from each other. eg, [1, 3, 5], [2, 5, 1]...

import random

sourcelist=[]
resultlist=[]

for x in range(100):
    sourcelist.append(x)

for y in sourcelist:
    resultlist.insert(random.randint(0,len(resultlist)),y)

print (resultlist)

Try using...

import random

LENGTH = 100

random_with_possible_duplicates = [random.randrange(-3, 3) for _ in range(LENGTH)]
random_without_duplicates = list(set(random_with_possible_duplicates)) # This removes duplicates

Advatages

Fast, efficient and readable.

Possible Issues

This method can change the length of the list if there are duplicates.

Edit: ignore my answer here . use python's random.shuffle or random.sample , as mentioned in other answers.

to sample integers without replacement between `minval` and `maxval`:
 
 
 
  
  import numpy as np minval, maxval, n_samples = -50, 50, 10 generator = np.random.default_rng(seed=0) samples = generator.permutation(np.arange(minval, maxval))[:n_samples] # or, if minval is 0, samples = generator.permutation(maxval)[:n_samples]
 
 

with jax:

 
 
 
  
  import jax minval, maxval, n_samples = -50, 50, 10 key = jax.random.PRNGKey(seed=0) samples = jax.random.shuffle(key, jax.numpy.arange(minval, maxval))[:n_samples]
 
 

If the amount of numbers you want is random, you can do something like this. In this case, length is the highest number you want to choose from.

If it notices the new random number was already chosen, itll subtract 1 from count (since a count was added before it knew whether it was a duplicate or not). If its not in the list, then do what you want with it and add it to the list so it cant get picked again.

import random
def randomizer(): 
            chosen_number=[]
            count=0
            user_input = int(input("Enter number for how many rows to randomly select: "))
            numlist=[]
            #length = whatever the highest number you want to choose from
            while 1<=user_input<=length:
                count=count+1
                if count>user_input:
                    break
                else:
                    chosen_number = random.randint(0, length)
                    if line_number in numlist:
                        count=count-1
                        continue
                    if chosen_number not in numlist:
                        numlist.append(chosen_number)
                        #do what you want here

If you wish to ensure that the numbers being added are unique, you could use a Set object

if using 2.7 or greater, or import the sets module if not.

As others have mentioned, this means the numbers are not truly random.

From the CLI in win xp:

python -c "import random; print(sorted(set([random.randint(6,49) for i in range(7)]))[:6])"

In Canada we have the 6/49 Lotto. I just wrap the above code in lotto.bat and run C:\home\lotto.bat or just C:\home\lotto .

Because random.randint often repeats a number, I use set with range(7) and then shorten it to a length of 6.

Occasionally if a number repeats more than 2 times the resulting list length will be less than 6.

EDIT: However, random.sample(range(6,49),6) is the correct way to go.

import random
result=[]
for i in range(1,50):
    rng=random.randint(1,20)
    result.append(rng)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM