简体   繁体   English

numpy:如何将矩阵随机拆分/选择为n个不同的矩阵

[英]Numpy: How to randomly split/select an matrix into n-different matrices

  • I have a numpy matrix with shape of (4601, 58). 我有一个形状为(4601,58)的numpy矩阵。
  • I want to split the matrix randomly as per 60%, 20%, 20% split based on number of rows 我想根据行数按60%,20%,20%的比例随机分割矩阵
  • This is for Machine Learning task I need 这是我需要的机器学习任务
  • Is there a numpy function that randomly selects rows? 是否有一个numpy函数可以随机选择行?

you can use numpy.random.shuffle 您可以使用numpy.random.shuffle

import numpy as np

N = 4601
data = np.arange(N*58).reshape(-1, 58)
np.random.shuffle(data)

a = data[:int(N*0.6)]
b = data[int(N*0.6):int(N*0.8)]
c = data[int(N*0.8):]

A complement to HYRY's answer if you want to shuffle consistently several arrays x, y, z with same first dimension: x.shape[0] == y.shape[0] == z.shape[0] == n_samples . 如果您要一致地随机播放具有相同第一维的多个数组x,y,z,则可以作为HYRY答案的补充: x.shape[0] == y.shape[0] == z.shape[0] == n_samples

You can do: 你可以做:

rng = np.random.RandomState(42)  # reproducible results with a fixed seed
indices = np.arange(n_samples)
rng.shuffle(indices)
x_shuffled = x[indices]
y_shuffled = y[indices]
z_shuffled = z[indices]

And then proceed with the split of each shuffled array as in HYRY's answer. 然后按照HYRY的答案进行每个随机排列的数组的拆分。

If you want to randomly select rows, you could just use random.sample from the standard Python library: 如果要随机选择行,则可以使用标准Python库中的random.sample

import random

population = range(4601) # Your number of rows
choice = random.sample(population, k) # k being the number of samples you require

random.sample samples without replacement, so you don't need to worry about repeated rows ending up in choice . random.sample样本无需替换,因此您不必担心重复的行最终会出现在choice Given a numpy array called matrix , you can select the rows by slicing, like this: matrix[choice] . 给定一个名为matrix的numpy数组,您可以通过切片来选择行,如下所示: matrix[choice]

Of, course, k can be equal to the number of total elements in the population, and then choice would contain a random ordering of the indices for your rows. 当然, k可以等于总体中总元素的数量,然后choice将包含行索引的随机排序。 Then you can partition choice as you please, if that's all you need. 然后,您可以根据需要对choice进行分区。

Since you need it for machine learning, here is a method I wrote: 由于您需要它进行机器学习,因此我写了一种方法:

import numpy as np

def split_random(matrix, percent_train=70, percent_test=15):
    """
    Splits matrix data into randomly ordered sets 
    grouped by provided percentages.

    Usage:
    rows = 100
    columns = 2
    matrix = np.random.rand(rows, columns)
    training, testing, validation = \
    split_random(matrix, percent_train=80, percent_test=10)

    percent_validation 10
    training (80, 2)
    testing (10, 2)
    validation (10, 2)

    Returns:
    - training_data: percentage_train e.g. 70%
    - testing_data: percent_test e.g. 15%
    - validation_data: reminder from 100% e.g. 15%
    Created by Uki D. Lucas on Feb. 4, 2017
    """

    percent_validation = 100 - percent_train - percent_test

    if percent_validation < 0:
        print("Make sure that the provided sum of " + \
        "training and testing percentages is equal, " + \
        "or less than 100%.")
        percent_validation = 0
    else:
        print("percent_validation", percent_validation)

    #print(matrix)  
    rows = matrix.shape[0]
    np.random.shuffle(matrix)

    end_training = int(rows*percent_train/100)    
    end_testing = end_training + int((rows * percent_test/100))

    training = matrix[:end_training]
    testing = matrix[end_training:end_testing]
    validation = matrix[end_testing:]
    return training, testing, validation

# TEST:
rows = 100
columns = 2
matrix = np.random.rand(rows, columns)
training, testing, validation = split_random(matrix, percent_train=80, percent_test=10) 

print("training",training.shape)
print("testing",testing.shape)
print("validation",validation.shape)

print(split_random.__doc__)
  • training (80, 2) 训练(80,2)
  • testing (10, 2) 测试(10,2)
  • validation (10, 2) 验证(10,2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM