简体   繁体   中英

random.shuffle erasing items and not shuffling properly

I am initializing two multivariate gaussian distributions like so and trying to implement a machine learning algorithm to draw a decision boundary between the classes:

import numpy as np
import matplotlib.pyplot as plt
import torch
import random

mu0 = [-2,-2]
mu1 = [2, 2]
cov = np.array([[1, 0],[0, 1]]) 
X = np.random.randn(10,2)
L = np.linalg.cholesky(cov)
Y0 = mu0 + X@L.T 
Y1 = mu1 + X@L.T

I have two separated circles and I am trying to stack Y0 and Y1, shuffle them, and then break them into training and testing splits. First I append the class labels to the data, and then stack.

n,m = Y1.shape
class0 = np.zeros((n,1))
class1 = np.ones((n,1))
Y_0 = np.hstack((Y0,class0))
Y_1 = np.hstack((Y1,class1))

data = np.vstack((Y_0,Y_1))

Now when i try to call random.shuffle(data) the zero class takes over and I get a small number of class one instances.

random.shuffle(data)

Here is my data before shuffling:

print(data)
[[-3.16184428 -1.89491433  0.        ]
 [ 0.2710061  -1.41000924  0.        ]
 [-3.50742027 -2.04238337  0.        ]
 [-1.39966859 -1.57430259  0.        ]
 [-0.98356629 -3.02299622  0.        ]
 [-0.49583458 -1.64067853  0.        ]
 [-2.62577229 -2.32941225  0.        ]
 [-1.16005269 -2.76429318  0.        ]
 [-1.88618759 -2.79178253  0.        ]
 [-1.34790868 -2.10294791  0.        ]
 [ 0.83815572  2.10508567  1.        ]
 [ 4.2710061   2.58999076  1.        ]
 [ 0.49257973  1.95761663  1.        ]
 [ 2.60033141  2.42569741  1.        ]
 [ 3.01643371  0.97700378  1.        ]
 [ 3.50416542  2.35932147  1.        ]
 [ 1.37422771  1.67058775  1.        ]
 [ 2.83994731  1.23570682  1.        ]
 [ 2.11381241  1.20821747  1.        ]
 [ 2.65209132  1.89705209  1.        ]]

and after shufffling:

data
array([[-0.335667  , -0.60826166,  0.        ],
       [-0.335667  , -0.60826166,  0.        ],
       [-0.335667  , -0.60826166,  0.        ],
       [-0.335667  , -0.60826166,  0.        ],
       [-2.22547604, -1.62833794,  0.        ],
       [-3.3287687 , -2.37694753,  0.        ],
       [-3.2915737 , -1.31558952,  0.        ],
       [-2.23912202, -1.54625136,  0.        ],
       [-0.335667  , -0.60826166,  0.        ],
       [-2.23912202, -1.54625136,  0.        ],
       [-2.11217077, -2.70157476,  0.        ],
       [-3.25714184, -2.7679462 ,  0.        ],
       [-3.2915737 , -1.31558952,  0.        ],
       [-2.22547604, -1.62833794,  0.        ],
       [ 0.73756329,  1.46127708,  1.        ],
       [ 1.88782923,  1.29842524,  1.        ],
       [ 1.77452396,  2.37166206,  1.        ],
       [ 1.77452396,  2.37166206,  1.        ],
       [ 3.664333  ,  3.39173834,  1.        ],
       [ 3.664333  ,  3.39173834,  1.        ]])

Why is random.shuffle deleting my data? I just need all twenty rows to be shuffled, but it is repeating lines and i am losing data. i'm not setting random.shuffle to a variable and am simply just calling random.shuffle(data) . Are there any other ways to simply shuffle my data?

Because the swap method used by the random.shuffle does not work in ndarray:

# Python 3.10.7 random.py
class Random(_random.Random):
    ...
    def shuffle(self, x, random=None):
        ...
        if random is None:
            randbelow = self._randbelow
            for i in reversed(range(1, len(x))):
                # pick an element in x[:i+1] with which to exchange x[i]
                j = randbelow(i + 1)
                x[i], x[j] = x[j], x[i]    # <----------------
        ...
    ...

Using index on multi-dimensional array will result in a view instead of a copy, which will prevent the swap from working properly. For more information, you can refer to this question .

Better choice numpy.random.Generator.shuffle :

>>> data
array([[-1.88985877, -2.97312795,  0.        ],
       [-1.52352452, -2.19633099,  0.        ],
       [-2.06297352, -1.36627294,  0.        ],
       [-1.47460488, -2.09410403,  0.        ],
       [-1.18753167, -1.71069966,  0.        ],
       [-1.92878766, -1.19545861,  0.        ],
       [-2.4858627 , -2.66525855,  0.        ],
       [-2.97169999, -1.46985506,  0.        ],
       [-2.11395907, -2.19108576,  0.        ],
       [-2.63976951, -1.66742147,  0.        ],
       [ 2.11014123,  1.02687205,  1.        ],
       [ 2.47647548,  1.80366901,  1.        ],
       [ 1.93702648,  2.63372706,  1.        ],
       [ 2.52539512,  1.90589597,  1.        ],
       [ 2.81246833,  2.28930034,  1.        ],
       [ 2.07121234,  2.80454139,  1.        ],
       [ 1.5141373 ,  1.33474145,  1.        ],
       [ 1.02830001,  2.53014494,  1.        ],
       [ 1.88604093,  1.80891424,  1.        ],
       [ 1.36023049,  2.33257853,  1.        ]])
>>> rng = np.random.default_rng()
>>> rng.shuffle(data, 0)
>>> data
array([[-1.92878766, -1.19545861,  0.        ],
       [-2.97169999, -1.46985506,  0.        ],
       [ 2.07121234,  2.80454139,  1.        ],
       [ 1.36023049,  2.33257853,  1.        ],
       [ 1.93702648,  2.63372706,  1.        ],
       [-2.11395907, -2.19108576,  0.        ],
       [-2.63976951, -1.66742147,  0.        ],
       [ 1.02830001,  2.53014494,  1.        ],
       [ 2.11014123,  1.02687205,  1.        ],
       [ 1.88604093,  1.80891424,  1.        ],
       [-1.47460488, -2.09410403,  0.        ],
       [ 2.52539512,  1.90589597,  1.        ],
       [-1.18753167, -1.71069966,  0.        ],
       [-1.88985877, -2.97312795,  0.        ],
       [ 2.81246833,  2.28930034,  1.        ],
       [-2.06297352, -1.36627294,  0.        ],
       [ 1.5141373 ,  1.33474145,  1.        ],
       [-2.4858627 , -2.66525855,  0.        ],
       [-1.52352452, -2.19633099,  0.        ],
       [ 2.47647548,  1.80366901,  1.        ]])

In this example, numpy.random.shuffle also works normally because OP just requires shuffling along the first axis, but numpy.random.Generator.shuffle is the recommended usage in the new code and supports shuffling along other axis.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM