I want to generate a list of unique numbers from 0 to 2 million, excluding several numbers. The best solution I came up with is this
excludez = [34, 394849, 2233, 22345, 95995, 2920]
random.sample([i for i in range(0,2000000) if i not in excludez ], 64)
This generates 64 random ints from 0 to 2 million, excluding values in the list excludez.
This contains a generator expression, so I am wondering if there is a faster solution to this. I am open to using any library, especially numpy.
Edit:
The generated samples should contain unique numbers.
Edit 2:
I tested all the solutions using
print(timeit(lambda: solnX(), number=256))
And then did 3 samples of that code.
Here are the average results:
Original: 135.838 seconds
@inspectorG4dget 0.02750687366665261
@jdehesa 1st solution 150.08836392466674 (surprising since was a numpy solution
@jdehesa 2nd solution 0.022973252333334433 seconds
@Andrej Kesely 0.016359308333373217 seconds
@Divakar 39.05853628633334 seconds
I timed in google colab, here's a link to the notebook. I rearranged the code a bit so that all solutions had a level playing field.
https://colab.research.google.com/drive/1ITYNrSTEVR_M5QZhqaSDmM8Q06IHsE73
Here's one with masking
-
def random_uniq(excludez, maxnum, num_samples):
m = np.ones(maxnum, dtype=bool)
m[excludez] = 0
c = np.count_nonzero(m)
idx = np.random.choice(c,num_samples,replace=False)
m2 = np.ones(c, dtype=bool)
m2[idx] = 0
mc = m.copy()
m[m] = m2
out = np.flatnonzero(m!=mc)
return out
excludez = [34, 394849, 2233, 22345, 95995, 2920]
out = random_uniq(excludez, maxnum=2000000, num_samples=64)
In [85]: excludez = set([34, 394849, 2233, 22345, 95995, 2920]) # faster lookups
In [86]: answer = set() # since you don't really care about order
In [87]: while len(answer) < 64:
...: r = random.randrange(0,2000000)
...: if r not in excludez and r not in answer: answer.add(r)
...:
This is one method to do it with NumPy:
import numpy as np
np.random.seed(0)
excludez = np.sort([2, 3, 6, 7, 13])
n = 15
size = 5
# Get unique integers in a reduced range
r = np.random.choice(n - len(excludez), size, replace=False)
# Shift values accordingly so excluded values are avoided
shift = np.arange(len(excludez) + 1)
r += shift[np.searchsorted(excludez - shift[:-1], r, 'right')]
print(r)
# [ 4 12 8 14 1]
Here is the same algorithm with plain Python:
import random
import bisect
random.seed(0)
excludez = [2, 3, 6, 7, 13]
n = 15
size = 5
shift = range(len(excludez) + 1)
search = [exc - i for i, exc in enumerate(excludez)]
r = random.sample(range(n - len(excludez)), size)
r = [v + shift[bisect.bisect_right(search, v)] for v in r]
print(r)
# [10, 14, 0, 4, 8]
One possible solution, method2
might contain duplicates, method3
no:
from timeit import timeit
import random
excludez = [34, 394849, 2233, 22345, 95995, 2920]
def method1():
return random.sample([i for i in range(0,2000000) if i not in excludez ], 64)
def method2():
out = []
while len(out) < 64:
i = int(random.random() * 2000000)
if i in excludez:
continue
out.append(i)
return out
def method3():
out = []
while len(out) < 64:
i = int(random.random() * 2000000)
if i in excludez or i in out:
continue
out.append(i)
return out
print(timeit(lambda: method1(), number=10))
print(timeit(lambda: method2(), number=10))
print(timeit(lambda: method3(), number=10))
Prints:
1.865599181000107
0.0002175730000999465
0.00039564000007885625
EDIT: Added int()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.