简体   繁体   English

基于比较运算符拆分训练/测试

[英]Split train/test on based on comparison operators

I'm trying to figure out how to split the data based on these conditions in order to run a CNN on this:我试图弄清楚如何根据这些条件拆分数据,以便在此运行 CNN:

Split the training/testing dataset into two sets: one with class labels < 5 and one with class labels >= 5. Print out the shapes of the resulting two sets from both training and testing datasets.将训练/测试数据集分成两组:一组 class 标签 < 5,一组 class 标签 >= 5。从训练和测试数据集中打印出结果两组的形状。

import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.utils import to_categorical
from tensorflow import keras

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()

The above code is how I'm loading in the data.上面的代码是我加载数据的方式。 And the below is how I'm interpreting it but I'm not sure I'm doing it right given the training images still have a shape of (50000,32,32,3).下面是我如何解释它,但我不确定我是否做对了,因为训练图像的形状仍然为 (50000,32,32,3)。 Was wondering if anyone can help me figure this out.想知道是否有人可以帮助我解决这个问题。

train_labels_first = train_labels[train_labels < 5]
test_labels_first = test_labels[test_labels < 5]


train_labels_second = train_labels[train_labels >= 5]
test_labels_second = test_labels[test_labels >= 5]

Just apply a boolean indexing on your train and test images.只需在您的训练和测试图像上应用 boolean 索引。 For example例如

train_images_first = train_images[train_labels[train_labels < 5]]
test_images_first = test_images[test_labels[test_labels < 5]]

print(train_images_first.shape, test_images_first.shape)
>>> (25000, 32, 32, 3) (5000, 32, 32, 3)

to get the labels just assign train_labels[train_labels < 5] to a new variable that holds labels up to value 5.要获得标签,只需将train_labels[train_labels < 5]分配给一个新变量,该变量将标签保存到值 5。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM