简体   繁体   English

如何在 X_train、y_train、X_test、y_test 中拆分图像数据集?

[英]How to split an image dataset in X_train, y_train, X_test, y_test?

I have a dataset like the following the structure:我有一个如下结构的数据集:

Dataset/
   |
   |
   -----Pothole/
   |         |
   |         ------ umm001.jpg
   |         |
   |         ------ abd.jpg
   |         |
   |         ------ 
   |         |
   |
   |
   ----Road/
         |
         ------road005.jpg
         |
         ------ummm.jpg
         |
         ------
         |

I want to split this dataset into X_train, y_train, X_test, y_test .我想将此数据集拆分为X_train, y_train, X_test, y_test

such that:使得:

### data: shuffled and split between train and test
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Or,或者,

(X_train, y_train), (X_test, y_test) = train_test_split(X,y, test_size=0.20)

How can I do this?我怎样才能做到这一点?

You can build X and y arrays using the os module:您可以使用os模块构建Xy数组:

import os

X = []
y = []
base_dir = '<full path to dataset folder>/'
for f in sorted(os.listdir(base_dir)):
    if os.path.isdir(base_dir+f):
        print(f"{f} is a target class")
        for i in sorted(os.listdir(base_dir+f)):
            print(f"{i} is an input image path")
            X.append(base_dir+f+'/'+i)
            y.append(f)
print(X)
print(y)

Then you can use train_test_split(X,y, test_size=0.20) to get what you need, but bear in mind that you will have to open the images using other library like pillow or scikit-image or a similar one.然后您可以使用train_test_split(X,y, test_size=0.20)来获取您需要的内容,但请记住,您必须使用其他库(如pillowscikit-image或类似库)打开图像。

If you are planning to use pytorch to train a neural network, you can use their ImageFolder class to create your dataset.如果您打算使用pytorch来训练神经网络,您可以使用它们的ImageFolder 来创建您的数据集。

You can always use scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html 您可以随时使用scikit-learn: https ://scikit-learn.org/stable/modules/generation/sklearn.model_selection.train_test_split.html

don't forget to import it 别忘了导入

from sklearn.model_selection import train_test_split

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将数据集拆分为 (X_train, y_train), (X_test, y_test)? - How to split dataset into (X_train, y_train), (X_test, y_test)? 如何将 tf.data.Dataset 拆分为 x_train、y_train、x_test、y_test for keras - how to split up tf.data.Dataset into x_train, y_train, x_test, y_test for keras sklearn中的x_test、x_train、y_test、y_train有什么区别? - What is the difference between x_test, x_train, y_test, y_train in sklearn? 在 tensorflow 中创建 X_test、X_train、Y_test、Y_train - Create X_test, X_train, Y_test, Y_train in tensorflow Even-Odd Train-Test Split with 2D array input and return the form of (X_train, y_train), (X_test, y_test) 的两个元组 - Even-Odd Train-Test Split with 2D array input and return two tuples of the form (X_train, y_train), (X_test, y_test) 将我自己的数据集转换为 Cifar10 格式 (X_train, y_train),(X_test, y_test) - convery my own datasets to Cifar10 format (X_train, y_train),(X_test, y_test) 我该如何克服这个问题 X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25, stratify=Y, random_state=2) - how can I overcome on this problem X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25, stratify=Y, random_state=2) 为什么在我将windowed_dataset放入python后X_train、y_train和x_test和y_test的值变成-100(深度学习预测) - why the value of X_train, y_train and x_test and y_test become - 100 after I put windowed_dataset in python (prediction with deep learning ) model.fit(X_train, y_train, epochs=5, validation_data=(X_test, y_test) 不工作 - model.fit(X_train, y_train, epochs=5, validation_data=(X_test, y_test) isn't working 如何将数据拆分为 x_train 和 y_train - How to split data into x_train and y_train
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM