简体   繁体   中英

Sampling unique column indexes for each row of a numpy array

I want to generate a fixed number of random column indexes (without replacement) for each row of a numpy array.

A = np.array([[3, 5, 2, 3, 3],
       [1, 3, 3, 4, 5],
       [3, 5, 4, 2, 1],
       [1, 2, 3, 5, 3]])

If I fixed the required column number to 2, I want something like

np.array([[1,3],
          [0,4],
          [1,4],
          [2,3]])

I am looking for a non-loop Numpy based solution. I tried with choice, but with the replacement=False I get error

ValueError: Cannot take a larger sample than population when 'replace=False'

Here's one vectorized approach inspired by this post -

def random_unique_indexes_per_row(A, N=2):
    m,n = A.shape
    return np.random.rand(m,n).argsort(1)[:,:N]

Sample run -

In [146]: A
Out[146]: 
array([[3, 5, 2, 3, 3],
       [1, 3, 3, 4, 5],
       [3, 5, 4, 2, 1],
       [1, 2, 3, 5, 3]])

In [147]: random_unique_indexes_per_row(A, N=2)
Out[147]: 
array([[4, 0],
       [0, 1],
       [3, 2],
       [2, 0]])
In [148]: random_unique_indexes_per_row(A, N=3)
Out[148]: 
array([[2, 0, 1],
       [3, 4, 2],
       [3, 2, 1],
       [4, 3, 0]])

像这样?

B = np.random.randint(5, size=(len(A), 2))

You can use random.choice() as following:

def random_indices(arr, n):
    x, y = arr.shape
    return np.random.choice(np.arange(y), (x, n))
    # or return np.random.randint(low=0, high=y, size=(x, n))

Demo:

In [34]: x, y = A.shape

In [35]: np.random.choice(np.arange(y), (x, 2))
Out[35]: 
array([[0, 2],
       [0, 1],
       [0, 1],
       [3, 1]])

As an experimental approach here is a way that in 99% of the times will give unique indices:

In [60]: def random_ind(arr, n):
    ...:     x, y = arr.shape
    ...:     ind = np.random.randint(low=0, high=y, size=(x * 2, n))
    ...:     _, index = np.unique(ind.dot(np.random.rand(ind.shape[1])), return_index=True)
    ...:     return ind[index][:4]
    ...: 
    ...: 
    ...: 

In [61]: random_ind(A, 2)
Out[61]: 
array([[0, 1],
       [1, 0],
       [1, 1],
       [1, 4]])

In [62]: random_ind(A, 2)
Out[62]: 
array([[1, 0],
       [2, 0],
       [2, 1],
       [3, 1]])

In [64]: random_ind(A, 3)
Out[64]: 
array([[0, 0, 0],
       [1, 1, 2],
       [0, 4, 1],
       [2, 3, 1]])

In [65]: random_ind(A, 4)
Out[65]: 
array([[0, 4, 0, 3],
       [1, 0, 1, 4],
       [0, 4, 1, 2],
       [3, 0, 1, 0]])

This function will return IndexError at line return ind[index][:4] if there's no 4 unique items in that case you can repeat the function to make sure you'll get the desire result.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM