Time performance of np.random.permutation, np.random.choice

Question

I encountered a function with very poor time performance relative to comparable MATLAB code in my pure python graph theory library, so I attempted to profile some operations in this function.

I tracked it to the following result

In [27]: timeit.timeit( 'permutation(138)[:4]', setup='from numpy.random import permutation', number=1000000)
Out[27]: 27.659916877746582

Compared this to the performance in MATLAB

>> tic; for i=1:1000000; randperm(138,4); end; toc
Elapsed time is 4.593305 seconds.

I was able to considerably improve performance by changing this to np.random.choice instead of np.random.permutation as I had originally wrote.

In [42]: timeit.timeit( 'choice(138, 4)', setup='from numpy.random import choice', number=1000000)
Out[42]: 18.9618501663208

But it still doesn't nearly approach the matlab performance.

Is there another way of obtaining this behavior in pure python with time performance approaching the MATLAB time performance?

Answer 1

Based on this solution that showed how one can simulate np.random.choice(..., replace=False) 's behavior with a trick based on argsort / argpartition , you can recreate MATLAB's randperm(138,4) , ie NumPy's np.random.choice(138,4, replace=False) with np.argpartition as :

np.random.rand(138).argpartition(range(4))[:4]

Or with np.argsort like so -

np.random.rand(138).argsort()[:4]

Let's time these two versions for performance comparison against the MATLAB version.

On MATLAB -

>> tic; for i=1:1000000; randperm(138,4); end; toc
Elapsed time is 1.058177 seconds.

On NumPy with np.argpartition -

In [361]: timeit.timeit( 'np.random.rand(138).argpartition(range(4))[:4]', setup='import numpy as np', number=1000000)
Out[361]: 9.063489798831142

On NumPy with np.argsort -

In [362]: timeit.timeit( 'np.random.rand(138).argsort()[:4]', setup='import numpy as np', number=1000000)
Out[362]: 5.74625801707225

The original proposed one with NumPy -

In [363]: timeit.timeit( 'choice(138, 4)', setup='from numpy.random import choice', number=1000000)
Out[363]: 6.793723535243771

Seems like one could use np.argsort for a marginal performance improvement.

Answer 2

How long does this take for you? I estimate 1-2 seconds.

def four():
    k = np.random.randint(138**4)
    a = k % 138
    b = k // 138 % 138
    c = k // 138**2 % 138
    d = k // 138**3 % 138
    return (a, b, c, d) if a != b and a != c and a != d and b != c and b != d and c != d else four()

Update 1: At first I used random.randrange , but np.random.randint made the whole thing about twice as fast.

Update 2: Since NumPy's random function appears to be much faster, I tried this and it's another factor ~1.33 faster:

>>> def four():
        a = randint(138)
        b = randint(138)
        c = randint(138)
        d = randint(138)
        return (a, b, c, d) if a != b and a != c and a != d and b != c and b != d and c != d else four()

>>> import timeit
>>> from numpy.random import randint
>>> timeit.timeit(lambda: four(), number=1000000)
2.3742770821572776

That's about 22 times faster than the original:

>>> timeit.timeit('permutation(138)[:4]', setup='from numpy.random import permutation', number=1000000)
51.80568455893672

(string vs lambda doesn't make a noticeable difference)

Time performance of np.random.permutation, np.random.choice

Question

2 answers

solution1
3 2016-02-24 04:33:03

solution2
2 ACCPTED 2016-02-24 02:46:55

Time performance of np.random.permutation, np.random.choice

Question

2 answers

solution1 3 2016-02-24 04:33:03

solution2 2 ACCPTED 2016-02-24 02:46:55

solution1
3 2016-02-24 04:33:03

solution2
2 ACCPTED 2016-02-24 02:46:55