[英]How to convert a folder of images into X and Y batches with Keras?
Say I have a folder of images such as:假设我有一个图像文件夹,例如:
PetData
|
Dog - images
|
Cat - images
How would I transform it into (x_train, y_train),(x_test, y_test) format?我如何将其转换为 (x_train, y_train),(x_test, y_test) 格式? I see this format used extensively with the MNIST dataset which goes like:
我看到这种格式广泛用于 MNIST 数据集,如下所示:
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
However i'd like to do this with my own folder of images.但是我想用我自己的图像文件夹来做到这一点。
mnist.load_data()
returns two tuples with the content of the images and the labels in uint8
arrays. mnist.load_data()
返回两个元组,其中包含图像内容和uint8
arrays 中的标签。 You should get those arrays by loading the images of your folders (you can use modules such as PIL.Image
in order to load X, your y is just the set labels provided by the folder name).您应该通过加载文件夹的图像来获取那些 arrays(您可以使用
PIL.Image
等模块来加载 X,您的 y 只是文件夹名称提供的设置标签)。
PIL.Image
use example: PIL.Image
使用示例:
from PIL import Image
import glob
for infile in glob.glob("*.jpg"):
im = Image.open(infile)
To split train/test you can use sklearn.model_selection.train_test_split
:要拆分训练/测试,您可以使用
sklearn.model_selection.train_test_split
:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
Suppose your train or test images are in folder PetData each class in separate folder as Dog and Cat .假设您的火车或测试图像位于PetData文件夹中,每个 class 位于单独的文件夹中,分别为Dog和Cat 。 You can use ImageDataGenerator to prepare your train/test data as below:
您可以使用ImageDataGenerator准备您的训练/测试数据,如下所示:
from keras import layers
from keras import models
model = models.Sequential()
#define your model
#..........
#......
#Using ImageDataGenerator to read images from directories
from keras.preprocessing.image import ImageDataGenerator
train_dir = "PetData/"
#PetData/Dog/ : dog images
#PetData/Cat/ : cat images
train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory( train_dir, target_size=(150, 150), batch_size=20)
history = model.fit_generator( train_generator, steps_per_epoch=100, epochs=30) #fit the model using train_generator
Hope this helps!希望这可以帮助!
If you want to import images from a folder in your computer you can import images 1 by 1 from the folder in insert the in a list.如果要从计算机中的文件夹中导入图像,可以从插入列表的文件夹中逐张导入图像。
Your folder format is as you have shown:您的文件夹格式如您所示:
PetData
|
Dog - images
|
Cat - images
Assume path
is a variable storing the address of PetData folder.假设
path
是存储 PetData 文件夹地址的变量。 We will use OpenCV to import images but you can use other libraries as well.我们将使用 OpenCV 导入图像,但您也可以使用其他库。
data = []
label = []
Files = ['Dog', 'Cat']
label_val = 0
for files in Files:
cpath = os.path.join(path, files)
cpath = os.path.join(cpath, 'images')
for img in os.listdir(cpath):
image_array = cv2.imread(os.path.join(cpath, img), cv2.IMREAD_COLOR)
data.append(image_array)
label.append(label_val)
label_val = 1
Convert the list to a numpy array.将列表转换为 numpy 数组。
data = np.asarray(data)
label = np.asarray(label)
After importing the images you can use train_test_split
to split the data for training and testing.导入图像后,您可以使用
train_test_split
拆分数据以进行训练和测试。
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=0.33, random_state=42)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.