简体   繁体   English

从numpy数组中随机选择行

[英]Randomly selecting rows from numpy array

I want to randomly select rows from a numpy array. 我想从一个numpy数组中随机选择行。 Say I have this array- 说我有这个数组-

A = [[1, 3, 0],
     [3, 2, 0],
     [0, 2, 1],
     [1, 1, 4],
     [3, 2, 2],
     [0, 1, 0],
     [1, 3, 1],
     [0, 4, 1],
     [2, 4, 2],
     [3, 3, 1]]

To randomly select say 6 rows, I am doing this: 要随机选择说6行,我正在这样做:

B = A[np.random.choice(A.shape[0], size=6, replace=False), :]

I want another array C which has the rows which were not selected in B. 我想要另一个数组C ,其中的行未在B中选择。

Is there some in-built method to do this or do I need to do a brute-force, checking rows of B with rows of A? 是否有一些内置方法可以执行此操作,或者我是否需要进行蛮力检查B行与A行?

You can use boolean masks and draw random indices from an integer array which is as long as yours. 您可以使用布尔掩码并从与您一样长的整数数组中绘制随机索引。 The ~ is an elementwise not: ~是元素形式的,不是:

idx = np.arange(A.shape[0])
mask = np.zeros_like(idx, dtype=bool)

selected = np.random.choice(idx, 6, replace=False)
mask[selected] = True

B = A[mask]
C = A[~mask]

You can make any number of row-wise random partitions of A by slicing a shuffled sequence of row indices: 您可以通过对随机排列的行索引序列进行切片来对A进行任意数量的按行随机分区:

ind = numpy.arange( A.shape[ 0 ] )
numpy.random.shuffle( ind )
B = A[ ind[ :6 ], : ]
C = A[ ind[ 6: ], : ]

If you don't want to change the order of the rows in each subset, you can sort each slice of the indices: 如果不想更改每个子集中的行顺序,可以对索引的每个切片进行排序:

B = A[ sorted( ind[ :6 ] ), : ]
C = A[ sorted( ind[ 6: ] ), : ]

(Note that the solution provided by @MaxNoe also preserves row order.) (请注意,@ MaxNoe提供的解决方案还保留行顺序。)

Solution

This gives you the indices for the selection: 这为您提供了选择的索引:

sel = np.random.choice(A.shape[0], size=6, replace=False)

and this B : 和这个B

B = A[sel]

Get all not selected indices: 获取所有未选择的索引:

unsel = list(set(range(A.shape[0])) - set(sel))

and use them for C : 并将它们用于C

C = A[unsel]

Variation with NumPy functions NumPy函数的变化

Instead of using set and list , you can use this: 您可以使用以下方法来代替setlist

unsel2 = np.setdiff1d(np.arange(A.shape[0]), sel)

For the example array the pure Python version: 对于示例数组,纯Python版本:

%%timeit
unsel1 = list(set(range(A.shape[0])) - set(sel)) 

100000 loops, best of 3: 8.42 µs per loop

is faster than the NumPy version: 比NumPy版本快:

%%timeit
unsel2 = np.setdiff1d(np.arange(A.shape[0]), sel)

10000 loops, best of 3: 77.5 µs per loop

For larger A the NumPy version is faster: 对于较大的A ,NumPy版本更快:

A = np.random.random((int(1e4), 3))
sel = np.random.choice(A.shape[0], size=6, replace=False)


%%timeit
unsel1 = list(set(range(A.shape[0])) - set(sel))

1000 loops, best of 3: 1.4 ms per loop


%%timeit
unsel2 = np.setdiff1d(np.arange(A.shape[0]), sel)

1000 loops, best of 3: 315 µs per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM