简体   繁体   中英

Randomly shuffle items in each row of numpy array

I have a numpy array like the following:

Xtrain = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [1, 7, 3]])

I want to shuffle the items of each row separately, but do not want the shuffle to be the same for each row (as in several examples just shuffle column order).

For example, I want an output like the following:

output = np.array([[3, 2, 1],
                   [4, 6, 5],
                   [7, 3, 1]])

How can I randomly shuffle each of the rows randomly in an efficient way? My actual np array is over 100000 rows and 1000 columns.

Since you want to only shuffle the columns you can just perform the shuffling on transposed of your matrix:

In [86]: np.random.shuffle(Xtrain.T)

In [87]: Xtrain
Out[87]: 
array([[2, 3, 1],
       [5, 6, 4],
       [7, 3, 1]])

Note that random.suffle() on a 2D array shuffles the rows not items in each rows. ie changes the position of the rows. Therefor if your change the position of the transposed matrix rows you're actually shuffling the columns of your original array.

If you still want a completely independent shuffle you can create random indexes for each row and then create the final array with a simple indexing:

In [172]: def crazyshuffle(arr):
     ...:     x, y = arr.shape
     ...:     rows = np.indices((x,y))[0]
     ...:     cols = [np.random.permutation(y) for _ in range(x)]
     ...:     return arr[rows, cols]
     ...: 

Demo:

In [173]: crazyshuffle(Xtrain)
Out[173]: 
array([[1, 3, 2],
       [6, 5, 4],
       [7, 3, 1]])

In [174]: crazyshuffle(Xtrain)
Out[174]: 
array([[2, 3, 1],
       [4, 6, 5],
       [1, 3, 7]])

From: https://github.com/numpy/numpy/issues/5173

def disarrange(a, axis=-1):
    """
    Shuffle `a` in-place along the given axis.

    Apply numpy.random.shuffle to the given axis of `a`.
    Each one-dimensional slice is shuffled independently.
    """
    b = a.swapaxes(axis, -1)
    # Shuffle `b` in-place along the last axis.  `b` is a view of `a`,
    # so `a` is shuffled in place, too.
    shp = b.shape[:-1]
    for ndx in np.ndindex(shp):
        np.random.shuffle(b[ndx])
    return

This solution is not efficient by any means, but I had fun thinking about it, so wrote it down. Basically, you ravel the array, and create an array of row labels, and an array of indices. You shuffle the index array, and index the original and row label arrays with that. Then you apply a stable argsort to the row labels to gather the data into rows. Apply that index and reshape and viola, data shuffled independently by rows:

import numpy as np

r, c = 3, 4  # x.shape

x = np.arange(12) + 1  # Already raveled 
inds = np.arange(x.size)
rows = np.repeat(np.arange(r).reshape(-1, 1), c, axis=1).ravel()

np.random.shuffle(inds)
x = x[inds]
rows = rows[inds]

inds = np.argsort(rows, kind='mergesort')
x = x[inds].reshape(r, c)

Here is an IDEOne Link

We can create a random 2-dimensional matrix, sort it by each row, and then use the index matrix given by argsort to reorder the target matrix.

target = np.random.randint(10, size=(5, 5))
# [[7 4 0 2 5]
# [5 6 4 8 7]
# [6 4 7 9 5]
# [8 6 6 2 8]
# [8 1 6 7 3]]

shuffle_helper = np.argsort(np.random.rand(5,5), axis=1)
# [[0 4 3 2 1]
# [4 2 1 3 0]
# [1 2 3 4 0]
# [1 2 4 3 0]
# [1 2 3 0 4]]

target[np.arange(shuffle_helper.shape[0])[:, None], shuffle_helper]
# array([[7, 5, 2, 0, 4],
#       [7, 4, 6, 8, 5],
#       [4, 7, 9, 5, 6],
#       [6, 6, 8, 2, 8],
#       [1, 6, 7, 8, 3]])

Explanation

  • We use np.random.rand and argsort to mimic the effect from shuffling.
  • random.rand gives randomness.
  • Then, we use argsort with axis=1 to help rank each row. This creates the index that can be used for reordering.

Lets say you have array a with shape 100000 x 1000.

b = np.random.choice(100000 * 1000, (100000, 1000), replace=False)
ind = np.argsort(b, axis=1)
a_shuffled = a[np.arange(100000)[:,np.newaxis], ind]

I don't know if this is faster than loop, because it needs sorting, but with this solution maybe you will invent something better, for example with np.argpartition instead of np.argsort

You may use Pandas :

df = pd.DataFrame(X_train)
_ = df.apply(lambda x: np.random.permutation(x), axis=1, raw=True)
df.values

Change the keyword to axis=0 if you want to shuffle columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM