简体   繁体   中英

How can i split the train, test, valid data from datasets and store it in pickle

Currently my datasets contain 161 folders with 500 data(.img) inside each folder. Total = 80500 data Is there any code I can change? Currently stuck in the process of split into Train/Valid/Test and save it.

The below shows the code of loading of my 161 folders datasets

   import os
import numpy as np
import cv2
import glob
folders = glob.glob('C:/Users/Pc/Desktop/datasets/*')
imagenames_list = []

for folder in folders:
    for f in glob.glob(folder+'/*.jpg'):
        imagenames_list.append(f)
        
read_images = []
for image in imagenames_list:
    read_images.append(cv2.imread(image, cv2.IMREAD_GRAYSCALE))
    
images = np.array(read_images)

The below code shows how am i split the data into 60% train / 20% test / 20% valid. Am i proceed with correct and the train/test/valid able to link to my datasets? How can i store them into a pickle file?

from sklearn.model_selection import train_test_split

X, y = np.random.random((80500,10)), np.random.random((80500,))

p = 0.2
new_p = (p*y.shape[0])/((1-p)*y.shape[0])

X, X_val, y, y_val = train_test_split(X, y, test_size=p)
X_train, X_test, y, y_test = train_test_split(X, y, test_size=new_p)

print([i.shape for i in [X_train, X_test, X_val]])

在此处输入图像描述 在此处输入图像描述

You can store them in a pickle file like this:

import pickle

dataset_dict = {"X_train": X_train, "X_test": X_test, "X_val": X_val, "y_train": y_train, "y_test": y_test, "y_val": y_val}

with open('dataset_dict.pickle', 'wb') as file:
    pickle.dump(dataset_dict, file)

And load them back like this;

with open('dataset_dict.pickle', 'rb') as file:
    dataset_dict = pickle.load(file)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM