简体   繁体   中英

Randomly selecting rows from numpy array

I want to randomly select rows from a numpy array. Say I have this array-

A = [[1, 3, 0],
     [3, 2, 0],
     [0, 2, 1],
     [1, 1, 4],
     [3, 2, 2],
     [0, 1, 0],
     [1, 3, 1],
     [0, 4, 1],
     [2, 4, 2],
     [3, 3, 1]]

To randomly select say 6 rows, I am doing this:

B = A[np.random.choice(A.shape[0], size=6, replace=False), :]

I want another array C which has the rows which were not selected in B.

Is there some in-built method to do this or do I need to do a brute-force, checking rows of B with rows of A?

You can use boolean masks and draw random indices from an integer array which is as long as yours. The ~ is an elementwise not:

idx = np.arange(A.shape[0])
mask = np.zeros_like(idx, dtype=bool)

selected = np.random.choice(idx, 6, replace=False)
mask[selected] = True

B = A[mask]
C = A[~mask]

You can make any number of row-wise random partitions of A by slicing a shuffled sequence of row indices:

ind = numpy.arange( A.shape[ 0 ] )
numpy.random.shuffle( ind )
B = A[ ind[ :6 ], : ]
C = A[ ind[ 6: ], : ]

If you don't want to change the order of the rows in each subset, you can sort each slice of the indices:

B = A[ sorted( ind[ :6 ] ), : ]
C = A[ sorted( ind[ 6: ] ), : ]

(Note that the solution provided by @MaxNoe also preserves row order.)

Solution

This gives you the indices for the selection:

sel = np.random.choice(A.shape[0], size=6, replace=False)

and this B :

B = A[sel]

Get all not selected indices:

unsel = list(set(range(A.shape[0])) - set(sel))

and use them for C :

C = A[unsel]

Variation with NumPy functions

Instead of using set and list , you can use this:

unsel2 = np.setdiff1d(np.arange(A.shape[0]), sel)

For the example array the pure Python version:

%%timeit
unsel1 = list(set(range(A.shape[0])) - set(sel)) 

100000 loops, best of 3: 8.42 µs per loop

is faster than the NumPy version:

%%timeit
unsel2 = np.setdiff1d(np.arange(A.shape[0]), sel)

10000 loops, best of 3: 77.5 µs per loop

For larger A the NumPy version is faster:

A = np.random.random((int(1e4), 3))
sel = np.random.choice(A.shape[0], size=6, replace=False)


%%timeit
unsel1 = list(set(range(A.shape[0])) - set(sel))

1000 loops, best of 3: 1.4 ms per loop


%%timeit
unsel2 = np.setdiff1d(np.arange(A.shape[0]), sel)

1000 loops, best of 3: 315 µs per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM