简体   繁体   中英

How to randomly shuffle data and target in python?

I have a 4D array training images, whose dimensions correspond to (image_number,channels,width,height). I also have a 2D target labels,whose dimensions correspond to (image_number,class_number). When training, I want to randomly shuffle the data by using random.shuffle, but how can I keep the labels shuffled by the same order of my images? Thx!

from sklearn.utils import shuffle
import numpy as np

X = np.array([[0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4]])
y = np.array([0, 1, 2, 3, 4])
X, y = shuffle(X, y)
print(X)
print(y)



[[1 1 1]
 [3 3 3]
 [0 0 0]
 [2 2 2]
 [4 4 4]] 

[1 3 0 2 4]

There is another easy way to do that. Let us suppose that there are total N images. Then we can do the following:

from random import shuffle

ind_list = [i for i in range(N)]
shuffle(ind_list)
train_new  = train[ind_list, :,:,:]
target_new = target[ind_list,]

If you want a numpy-only solution, you can just reindex the second array on the first, assuming you've got the same image numbers in both:

In [67]: train = np.arange(20).reshape(4,5).T

In [68]: target = np.hstack([np.arange(5).reshape(5,1), np.arange(100, 105).reshape(5,1)])

In [69]: train
Out[69]:
array([[ 0,  5, 10, 15],
       [ 1,  6, 11, 16],
       [ 2,  7, 12, 17],
       [ 3,  8, 13, 18],
       [ 4,  9, 14, 19]])

In [70]: target
Out[70]:
array([[  0, 100],
       [  1, 101],
       [  2, 102],
       [  3, 103],
       [  4, 104]])

In [71]: np.random.shuffle(train)

In [72]: target[train[:,0]]
Out[72]:
array([[  2, 102],
       [  3, 103],
       [  1, 101],
       [  4, 104],
       [  0, 100]])

In [73]: train
Out[73]:
array([[ 2,  7, 12, 17],
       [ 3,  8, 13, 18],
       [ 1,  6, 11, 16],
       [ 4,  9, 14, 19],
       [ 0,  5, 10, 15]])

Depending on what you want to do, you could also randomly generate a number for each dimension of your array with

random.randint(a, b)  #a and b are the extremes of your array

which would select randomly amongst your objects.

If you're looking for a sync/ unison shuffle you can use the following func.

def unisonShuffleDataset(a, b):
    assert len(a) == len(b)
    p = np.random.permutation(len(a))
    return a[p], b[p]

the one above is only for 2 numpy. One can extend to more than 2 by adding the number of input vars on the func. and also on the return of the function.

Use the same seed to build the random generator multiple times to shuffle different arrays:

>>> seed = np.random.SeedSequence()
>>> arrays = [np.arange(10).repeat(i).reshape(10, -1) for i in range(1, 4)]
>>> for ar in arrays:
...     np.random.default_rng(seed).shuffle(ar)
...
>>> arrays
[array([[1],
        [2],
        [7],
        [8],
        [0],
        [4],
        [3],
        [6],
        [9],
        [5]]),
 array([[1, 1],
        [2, 2],
        [7, 7],
        [8, 8],
        [0, 0],
        [4, 4],
        [3, 3],
        [6, 6],
        [9, 9],
        [5, 5]]),
 array([[1, 1, 1],
        [2, 2, 2],
        [7, 7, 7],
        [8, 8, 8],
        [0, 0, 0],
        [4, 4, 4],
        [3, 3, 3],
        [6, 6, 6],
        [9, 9, 9],
        [5, 5, 5]])]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM