![](/img/trans.png)
[英]How can I preprocess a tf.data.Dataset using a provided preprocess_input function that expects a tf.Tensor?
[英]How to preprocess a huge dataset and save it such that I can train the data in Python
我想預處理用於訓練模型的巨大圖像數據集(600k)。 但是,它占用了太多內存,我一直在尋找解決方案,但在這里沒有一個適合我的問題。 這是我的代碼的一部分。 我還是深度學習的新手,我認為我在預處理數據方面做得很差。 如果有人知道如何解決這個內存問題,將不勝感激。
# Read the CSV File
data_frame = pd.read_csv("D:\\Downloads\\ndsc-beginner\\train.csv")
#Load the image
def load_image(img_path, target_size=(256, 256)):
#Check if the img_path has .jpg behind the name
if img_path[-4:] != '.jpg':
# Load the image
img = load_img(img_path+'.jpg',
target_size=target_size, grayscale=True)
else:
#Load the image
img = load_img(img_path, target_size=target_size, grayscale=True)
# Convert to a numpy array
return img_to_array(img)
IMG_SIZE = 256
image_arr = []
# Get the category column values
category_id = data_frame['Category']
# Change the category to one-hot - has 50 categories
dummy_cat_id = keras.utils.np_utils.to_categorical(category_id, 50)
# Get the image paths column values
path_list = data_frame.iloc[1:, -1]
# Batch generator
def batch_gen(data, batch_size):
for i in range(0, len(data), batch_size):
yield data[i:i+batch_size]
# Append the numpy array(img) and category label into an array.
def extract_data(data_frame):
total_count = len(path_list)
batch_size = 1000
index = 0
for path in batch_gen(path_list,batch_size):
for mini_path in path:
image_arr.append([load_image(mini_path), dummy_cat_id[index]])
print(index)
index+= 1
#extract_data(data_frame)
random.shuffle(image_arr)
# Features and Labels for training data
trainImages = np.array([i[0] for i in image_arr]
).reshape(-1, IMG_SIZE, IMG_SIZE, 1)
trainLabels = np.array([i[1] for i in image_arr])
trainImages = trainImages.astype('float32')
trainImages /= 255.0
我看到在預處理中,您只是將圖像灰度化並對其進行標准化。 如果您使用的是 Keras,您可以使用以下內容進行標准化並將您的圖像轉換為灰度確保您提供包含圖像所在的類文件夾的路徑。 如果需要,您可以將課程模式更改為分類模式
train_datagen = ImageDataGenerator(rescale=1./255)
train_gen = train_datagen.flow_from_directory(
f'{readPath}/training/',
target_size=(100,100),
color_mode='grayscale',
batch_size=32,
classes=['cat','dog'],
class_mode='binary'
)
要訓練,您可以使用 model.fit_generator() 函數
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.