I encountered a function with very poor time performance relative to comparable MATLAB code in my pure python graph theory library, so I attempted to profile some operations in this function.
I tracked it to the following result
In [27]: timeit.timeit( 'permutation(138)[:4]', setup='from numpy.random import permutation', number=1000000)
Out[27]: 27.659916877746582
Compared this to the performance in MATLAB
>> tic; for i=1:1000000; randperm(138,4); end; toc
Elapsed time is 4.593305 seconds.
I was able to considerably improve performance by changing this to np.random.choice
instead of np.random.permutation
as I had originally wrote.
In [42]: timeit.timeit( 'choice(138, 4)', setup='from numpy.random import choice', number=1000000)
Out[42]: 18.9618501663208
But it still doesn't nearly approach the matlab performance.
Is there another way of obtaining this behavior in pure python with time performance approaching the MATLAB time performance?
Based on this solution
that showed how one can simulate np.random.choice(..., replace=False)
's behavior with a trick based on argsort
/ argpartition
, you can recreate MATLAB's randperm(138,4)
, ie NumPy's np.random.choice(138,4, replace=False)
with np.argpartition
as :
np.random.rand(138).argpartition(range(4))[:4]
Or with np.argsort
like so -
np.random.rand(138).argsort()[:4]
Let's time these two versions for performance comparison against the MATLAB version.
On MATLAB -
>> tic; for i=1:1000000; randperm(138,4); end; toc
Elapsed time is 1.058177 seconds.
On NumPy with np.argpartition
-
In [361]: timeit.timeit( 'np.random.rand(138).argpartition(range(4))[:4]', setup='import numpy as np', number=1000000)
Out[361]: 9.063489798831142
On NumPy with np.argsort
-
In [362]: timeit.timeit( 'np.random.rand(138).argsort()[:4]', setup='import numpy as np', number=1000000)
Out[362]: 5.74625801707225
The original proposed one with NumPy -
In [363]: timeit.timeit( 'choice(138, 4)', setup='from numpy.random import choice', number=1000000)
Out[363]: 6.793723535243771
Seems like one could use np.argsort
for a marginal performance improvement.
How long does this take for you? I estimate 1-2 seconds.
def four():
k = np.random.randint(138**4)
a = k % 138
b = k // 138 % 138
c = k // 138**2 % 138
d = k // 138**3 % 138
return (a, b, c, d) if a != b and a != c and a != d and b != c and b != d and c != d else four()
Update 1: At first I used random.randrange
, but np.random.randint
made the whole thing about twice as fast.
Update 2: Since NumPy's random function appears to be much faster, I tried this and it's another factor ~1.33 faster:
>>> def four():
a = randint(138)
b = randint(138)
c = randint(138)
d = randint(138)
return (a, b, c, d) if a != b and a != c and a != d and b != c and b != d and c != d else four()
>>> import timeit
>>> from numpy.random import randint
>>> timeit.timeit(lambda: four(), number=1000000)
2.3742770821572776
That's about 22 times faster than the original:
>>> timeit.timeit('permutation(138)[:4]', setup='from numpy.random import permutation', number=1000000)
51.80568455893672
(string vs lambda
doesn't make a noticeable difference)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.