Sampling unique column indexes for each row of a numpy array

Question

I want to generate a fixed number of random column indexes (without replacement) for each row of a numpy array.

A = np.array([[3, 5, 2, 3, 3],
       [1, 3, 3, 4, 5],
       [3, 5, 4, 2, 1],
       [1, 2, 3, 5, 3]])

If I fixed the required column number to 2, I want something like

np.array([[1,3],
          [0,4],
          [1,4],
          [2,3]])

I am looking for a non-loop Numpy based solution. I tried with choice, but with the replacement=False I get error

ValueError: Cannot take a larger sample than population when 'replace=False'

Answer 1

Here's one vectorized approach inspired by this post -

def random_unique_indexes_per_row(A, N=2):
    m,n = A.shape
    return np.random.rand(m,n).argsort(1)[:,:N]

Sample run -

In [146]: A
Out[146]: 
array([[3, 5, 2, 3, 3],
       [1, 3, 3, 4, 5],
       [3, 5, 4, 2, 1],
       [1, 2, 3, 5, 3]])

In [147]: random_unique_indexes_per_row(A, N=2)
Out[147]: 
array([[4, 0],
       [0, 1],
       [3, 2],
       [2, 0]])
In [148]: random_unique_indexes_per_row(A, N=3)
Out[148]: 
array([[2, 0, 1],
       [3, 4, 2],
       [3, 2, 1],
       [4, 3, 0]])

Answer 2

像这样？

B = np.random.randint(5, size=(len(A), 2))

Answer 3

You can use random.choice() as following:

def random_indices(arr, n):
    x, y = arr.shape
    return np.random.choice(np.arange(y), (x, n))
    # or return np.random.randint(low=0, high=y, size=(x, n))

Demo:

In [34]: x, y = A.shape

In [35]: np.random.choice(np.arange(y), (x, 2))
Out[35]: 
array([[0, 2],
       [0, 1],
       [0, 1],
       [3, 1]])

As an experimental approach here is a way that in 99% of the times will give unique indices:

In [60]: def random_ind(arr, n):
    ...:     x, y = arr.shape
    ...:     ind = np.random.randint(low=0, high=y, size=(x * 2, n))
    ...:     _, index = np.unique(ind.dot(np.random.rand(ind.shape[1])), return_index=True)
    ...:     return ind[index][:4]
    ...: 
    ...: 
    ...: 

In [61]: random_ind(A, 2)
Out[61]: 
array([[0, 1],
       [1, 0],
       [1, 1],
       [1, 4]])

In [62]: random_ind(A, 2)
Out[62]: 
array([[1, 0],
       [2, 0],
       [2, 1],
       [3, 1]])

In [64]: random_ind(A, 3)
Out[64]: 
array([[0, 0, 0],
       [1, 1, 2],
       [0, 4, 1],
       [2, 3, 1]])

In [65]: random_ind(A, 4)
Out[65]: 
array([[0, 4, 0, 3],
       [1, 0, 1, 4],
       [0, 4, 1, 2],
       [3, 0, 1, 0]])

This function will return IndexError at line return ind[index][:4] if there's no 4 unique items in that case you can repeat the function to make sure you'll get the desire result.

Sampling unique column indexes for each row of a numpy array

Question

3 answers

solution1
2 ACCPTED 2018-07-11 07:58:02

solution2
1 2018-07-11 07:33:20

solution3
0 2018-07-11 07:28:12

Sampling unique column indexes for each row of a numpy array

Question

3 answers

solution1 2 ACCPTED 2018-07-11 07:58:02

solution2 1 2018-07-11 07:33:20

solution3 0 2018-07-11 07:28:12

solution1
2 ACCPTED 2018-07-11 07:58:02

solution2
1 2018-07-11 07:33:20

solution3
0 2018-07-11 07:28:12