简体   繁体   English

随机打乱numpy数组每行中的项目

[英]Randomly shuffle items in each row of numpy array

I have a numpy array like the following:我有一个如下的numpy数组:

Xtrain = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [1, 7, 3]])

I want to shuffle the items of each row separately, but do not want the shuffle to be the same for each row (as in several examples just shuffle column order).我想分别洗牌每行的项目,但不希望每行的洗牌都相同(在几个例子中只是洗牌列顺序)。

For example, I want an output like the following:例如,我想要如下输出:

output = np.array([[3, 2, 1],
                   [4, 6, 5],
                   [7, 3, 1]])

How can I randomly shuffle each of the rows randomly in an efficient way?如何以有效的方式随机随机打乱每一行? My actual np array is over 100000 rows and 1000 columns.我的实际 np 数组超过 100000 行和 1000 列。

Since you want to only shuffle the columns you can just perform the shuffling on transposed of your matrix:由于您只想对列进行洗牌,因此您可以对矩阵的转置执行洗牌

In [86]: np.random.shuffle(Xtrain.T)

In [87]: Xtrain
Out[87]: 
array([[2, 3, 1],
       [5, 6, 4],
       [7, 3, 1]])

Note that random.suffle() on a 2D array shuffles the rows not items in each rows.请注意,二维数组上的random.suffle()会随机播放行而不是每行中的项目。 ie changes the position of the rows.即改变行的位置。 Therefor if your change the position of the transposed matrix rows you're actually shuffling the columns of your original array.因此,如果您更改转置矩阵行的位置,您实际上是在改组原始数组的列。

If you still want a completely independent shuffle you can create random indexes for each row and then create the final array with a simple indexing:如果您仍然想要一个完全独立的随机播放,您可以为每一行创建随机索引,然后使用简单的索引创建最终数组:

In [172]: def crazyshuffle(arr):
     ...:     x, y = arr.shape
     ...:     rows = np.indices((x,y))[0]
     ...:     cols = [np.random.permutation(y) for _ in range(x)]
     ...:     return arr[rows, cols]
     ...: 

Demo:演示:

In [173]: crazyshuffle(Xtrain)
Out[173]: 
array([[1, 3, 2],
       [6, 5, 4],
       [7, 3, 1]])

In [174]: crazyshuffle(Xtrain)
Out[174]: 
array([[2, 3, 1],
       [4, 6, 5],
       [1, 3, 7]])

From: https://github.com/numpy/numpy/issues/5173来自: https ://github.com/numpy/numpy/issues/5173

def disarrange(a, axis=-1):
    """
    Shuffle `a` in-place along the given axis.

    Apply numpy.random.shuffle to the given axis of `a`.
    Each one-dimensional slice is shuffled independently.
    """
    b = a.swapaxes(axis, -1)
    # Shuffle `b` in-place along the last axis.  `b` is a view of `a`,
    # so `a` is shuffled in place, too.
    shp = b.shape[:-1]
    for ndx in np.ndindex(shp):
        np.random.shuffle(b[ndx])
    return

This solution is not efficient by any means, but I had fun thinking about it, so wrote it down.这个解决方案无论如何都不是有效的,但我觉得它很有趣,所以把它写下来。 Basically, you ravel the array, and create an array of row labels, and an array of indices.基本上,你解开数组,并创建一个行标签数组和一个索引数组。 You shuffle the index array, and index the original and row label arrays with that.您打乱索引数组,并用它索引原始和行标签数组。 Then you apply a stable argsort to the row labels to gather the data into rows.然后,您将稳定的 argsort 应用于行标签以将数据收集到行中。 Apply that index and reshape and viola, data shuffled independently by rows:应用该索引并重塑和中提琴,数据按行独立打乱:

import numpy as np

r, c = 3, 4  # x.shape

x = np.arange(12) + 1  # Already raveled 
inds = np.arange(x.size)
rows = np.repeat(np.arange(r).reshape(-1, 1), c, axis=1).ravel()

np.random.shuffle(inds)
x = x[inds]
rows = rows[inds]

inds = np.argsort(rows, kind='mergesort')
x = x[inds].reshape(r, c)

Here is an IDEOne Link这是一个IDEOne 链接

We can create a random 2-dimensional matrix, sort it by each row, and then use the index matrix given by argsort to reorder the target matrix.我们可以创建一个随机的二维矩阵,按每一行排序,然后使用argsort给出的索引矩阵对目标矩阵进行重新排序。

target = np.random.randint(10, size=(5, 5))
# [[7 4 0 2 5]
# [5 6 4 8 7]
# [6 4 7 9 5]
# [8 6 6 2 8]
# [8 1 6 7 3]]

shuffle_helper = np.argsort(np.random.rand(5,5), axis=1)
# [[0 4 3 2 1]
# [4 2 1 3 0]
# [1 2 3 4 0]
# [1 2 4 3 0]
# [1 2 3 0 4]]

target[np.arange(shuffle_helper.shape[0])[:, None], shuffle_helper]
# array([[7, 5, 2, 0, 4],
#       [7, 4, 6, 8, 5],
#       [4, 7, 9, 5, 6],
#       [6, 6, 8, 2, 8],
#       [1, 6, 7, 8, 3]])

Explanation解释

  • We use np.random.rand and argsort to mimic the effect from shuffling.我们使用np.random.randargsort来模拟洗牌的效果。
  • random.rand gives randomness. random.rand给出随机性。
  • Then, we use argsort with axis=1 to help rank each row.然后,我们使用argsortaxis=1来帮助对每一行进行排名。 This creates the index that can be used for reordering.这将创建可用于重新排序的索引。

Lets say you have array a with shape 100000 x 1000.假设您有a形状为 100000 x 1000 的数组。

b = np.random.choice(100000 * 1000, (100000, 1000), replace=False)
ind = np.argsort(b, axis=1)
a_shuffled = a[np.arange(100000)[:,np.newaxis], ind]

I don't know if this is faster than loop, because it needs sorting, but with this solution maybe you will invent something better, for example with np.argpartition instead of np.argsort我不知道这是否比循环快,因为它需要排序,但是使用此解决方案也许您会发明更好的东西,例如使用np.argpartition而不是np.argsort

You may use Pandas :你可以使用Pandas

df = pd.DataFrame(X_train)
_ = df.apply(lambda x: np.random.permutation(x), axis=1, raw=True)
df.values

Change the keyword to axis=0 if you want to shuffle columns.如果要随机排列列,请将关键字更改为axis=0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM