[英]How to split an image dataset in X_train, y_train, X_test, y_test?
I have a dataset like the following the structure:我有一个如下结构的数据集:
Dataset/
|
|
-----Pothole/
| |
| ------ umm001.jpg
| |
| ------ abd.jpg
| |
| ------
| |
|
|
----Road/
|
------road005.jpg
|
------ummm.jpg
|
------
|
I want to split this dataset into X_train, y_train, X_test, y_test
.我想将此数据集拆分为X_train, y_train, X_test, y_test
。
such that:使得:
### data: shuffled and split between train and test
(X_train, y_train), (X_test, y_test) = mnist.load_data()
Or,或者,
(X_train, y_train), (X_test, y_test) = train_test_split(X,y, test_size=0.20)
How can I do this?我怎样才能做到这一点?
You can build X
and y
arrays using the os
module:您可以使用os
模块构建X
和y
数组:
import os
X = []
y = []
base_dir = '<full path to dataset folder>/'
for f in sorted(os.listdir(base_dir)):
if os.path.isdir(base_dir+f):
print(f"{f} is a target class")
for i in sorted(os.listdir(base_dir+f)):
print(f"{i} is an input image path")
X.append(base_dir+f+'/'+i)
y.append(f)
print(X)
print(y)
Then you can use train_test_split(X,y, test_size=0.20)
to get what you need, but bear in mind that you will have to open the images using other library like pillow
or scikit-image
or a similar one.然后您可以使用train_test_split(X,y, test_size=0.20)
来获取您需要的内容,但请记住,您必须使用其他库(如pillow
或scikit-image
或类似库)打开图像。
If you are planning to use pytorch
to train a neural network, you can use their ImageFolder
class to create your dataset.如果您打算使用pytorch
来训练神经网络,您可以使用它们的ImageFolder
类来创建您的数据集。
You can always use scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html 您可以随时使用scikit-learn: https ://scikit-learn.org/stable/modules/generation/sklearn.model_selection.train_test_split.html
don't forget to import it 别忘了导入
from sklearn.model_selection import train_test_split
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.