如何在 X_train、y_train、X_test、y_test 中拆分图像数据集？

Question

I have a dataset like the following the structure:我有一个如下结构的数据集：

Dataset/
   |
   |
   -----Pothole/
   |         |
   |         ------ umm001.jpg
   |         |
   |         ------ abd.jpg
   |         |
   |         ------ 
   |         |
   |
   |
   ----Road/
         |
         ------road005.jpg
         |
         ------ummm.jpg
         |
         ------
         |

I want to split this dataset into X_train, y_train, X_test, y_test .我想将此数据集拆分为X_train, y_train, X_test, y_test 。

such that:使得：

### data: shuffled and split between train and test
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Or,或者，

(X_train, y_train), (X_test, y_test) = train_test_split(X,y, test_size=0.20)

How can I do this?我怎样才能做到这一点？

Answer 1

You can build X and y arrays using the os module:您可以使用os模块构建X和y数组：

import os

X = []
y = []
base_dir = '<full path to dataset folder>/'
for f in sorted(os.listdir(base_dir)):
    if os.path.isdir(base_dir+f):
        print(f"{f} is a target class")
        for i in sorted(os.listdir(base_dir+f)):
            print(f"{i} is an input image path")
            X.append(base_dir+f+'/'+i)
            y.append(f)
print(X)
print(y)

Then you can use train_test_split(X,y, test_size=0.20) to get what you need, but bear in mind that you will have to open the images using other library like pillow or scikit-image or a similar one.然后您可以使用train_test_split(X,y, test_size=0.20)来获取您需要的内容，但请记住，您必须使用其他库（如pillow或scikit-image或类似库）打开图像。

If you are planning to use pytorch to train a neural network, you can use their ImageFolder class to create your dataset.如果您打算使用pytorch来训练神经网络，您可以使用它们的ImageFolder 类来创建您的数据集。

Answer 2

You can always use scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html 您可以随时使用scikit-learn： https ://scikit-learn.org/stable/modules/generation/sklearn.model_selection.train_test_split.html

don't forget to import it 别忘了导入

from sklearn.model_selection import train_test_split

如何在 X_train、y_train、X_test、y_test 中拆分图像数据集？

问题描述

1 个解决方案

解决方案1
4 已采纳 2019-01-23 18:59:24

解决方案2
-1 2019-01-23 18:13:24

如何在 X_train、y_train、X_test、y_test 中拆分图像数据集？

问题描述

1 个解决方案

解决方案1 4 已采纳 2019-01-23 18:59:24

解决方案2 -1 2019-01-23 18:13:24

解决方案1
4 已采纳 2019-01-23 18:59:24

解决方案2
-1 2019-01-23 18:13:24