简体   繁体   中英

Most fast and computationally efficient way of generating several unique random ints within range, excluding list of ints

I want to generate a list of unique numbers from 0 to 2 million, excluding several numbers. The best solution I came up with is this

excludez = [34, 394849, 2233, 22345, 95995, 2920]

random.sample([i for i in range(0,2000000) if i not in excludez ], 64)

This generates 64 random ints from 0 to 2 million, excluding values in the list excludez.

This contains a generator expression, so I am wondering if there is a faster solution to this. I am open to using any library, especially numpy.

Edit:

The generated samples should contain unique numbers.

Edit 2:

I tested all the solutions using

print(timeit(lambda: solnX(), number=256))

And then did 3 samples of that code.

Here are the average results:

Original: 135.838 seconds

@inspectorG4dget 0.02750687366665261

@jdehesa 1st solution 150.08836392466674 (surprising since was a numpy solution

@jdehesa 2nd solution 0.022973252333334433 seconds

@Andrej Kesely 0.016359308333373217 seconds

@Divakar 39.05853628633334 seconds

I timed in google colab, here's a link to the notebook. I rearranged the code a bit so that all solutions had a level playing field.

https://colab.research.google.com/drive/1ITYNrSTEVR_M5QZhqaSDmM8Q06IHsE73

Here's one with masking -

def random_uniq(excludez, maxnum, num_samples):
    m = np.ones(maxnum, dtype=bool)
    m[excludez] = 0

    c = np.count_nonzero(m)

    idx = np.random.choice(c,num_samples,replace=False)
    m2 = np.ones(c, dtype=bool)
    m2[idx] = 0

    mc = m.copy()
    m[m] = m2
    out = np.flatnonzero(m!=mc)
    return out

excludez = [34, 394849, 2233, 22345, 95995, 2920]
out = random_uniq(excludez, maxnum=2000000, num_samples=64)
In [85]: excludez = set([34, 394849, 2233, 22345, 95995, 2920])  # faster lookups
In [86]: answer = set()  # since you don't really care about order
In [87]: while len(answer) < 64: 
    ...:     r = random.randrange(0,2000000) 
    ...:     if r not in excludez and r not in answer: answer.add(r) 
    ...:                              

This is one method to do it with NumPy:

import numpy as np

np.random.seed(0)
excludez = np.sort([2, 3, 6, 7, 13])
n = 15
size = 5
# Get unique integers in a reduced range
r = np.random.choice(n - len(excludez), size, replace=False)
# Shift values accordingly so excluded values are avoided
shift = np.arange(len(excludez) + 1)
r += shift[np.searchsorted(excludez - shift[:-1], r, 'right')]
print(r)
# [ 4 12  8 14  1]

Here is the same algorithm with plain Python:

import random
import bisect

random.seed(0)
excludez = [2, 3, 6, 7, 13]
n = 15
size = 5
shift = range(len(excludez) + 1)
search = [exc - i for i, exc in enumerate(excludez)]
r = random.sample(range(n - len(excludez)), size)
r = [v + shift[bisect.bisect_right(search, v)] for v in r]
print(r)
# [10, 14, 0, 4, 8]

One possible solution, method2 might contain duplicates, method3 no:

from timeit import timeit
import random

excludez = [34, 394849, 2233, 22345, 95995, 2920]

def method1():
    return random.sample([i for i in range(0,2000000) if i not in excludez ], 64)

def method2():
    out = []
    while len(out) < 64:
        i = int(random.random() * 2000000)
        if i in excludez:
            continue
        out.append(i)
    return out

def method3():
    out = []
    while len(out) < 64:
        i = int(random.random() * 2000000)
        if i in excludez or i in out:
            continue
        out.append(i)
    return out

print(timeit(lambda: method1(), number=10))
print(timeit(lambda: method2(), number=10))
print(timeit(lambda: method3(), number=10))

Prints:

1.865599181000107
0.0002175730000999465
0.00039564000007885625

EDIT: Added int()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM