I try to create my own image datasets for machine learning.
The workflow I thought is the following :
①Load all image files as an array in the folder.
②Label the loaded images
③Split loaded image files to image_data and label_data.
④Finally, split image_data to image_train_data and image_test_data and split label_data to label_train_data and label_test_data.
However, it doesn't go well in the first step(①).
How can I load all image data efficiently?
And if you implement an image data set for machine learning according to this workflow, how you handle it?
I wrote following code.
cat_im = cv2.imread("C:\\Users\\path\\cat1.jpg")
But, Am I forced writing \\cat1.jpg , \\cat2.jpg ,\\cat3.jpg.....?
## you can find all images like extenstion
import os,cv2
import glob
all_images_path= glob.glob('some_folder\images\*png') ## it gives path of images as list
## then you can loop over all files
loaded_images = []
for image_path in all_images_path:
image = cv2.imread(image_path)
loaded_images.append(image)
## lets assume your labels are just name of files and its like cat1.png,cat2.png etc
labels = []
for image_path in all_images_path:
labels.append(os.basename(image_path))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.