如何為大型圖像數據集使用更少的內存？（Python - Keras）

Question

我是深度學習的新手。 我正在嘗試訓練一個識別植物病害的模型，我正在使用這個包含一堆圖像的數據集。 我知道這是很多數據，我只使用color子文件夾。 我想使用該子文件夾中的所有數據。 問題是，到目前為止， Kaggle僅提供 13GB 的內存，並且我的會話不斷重新啟動，因為我的腳本試圖使用比它更多的內存。 這是我的代碼：

### There are some imports here that I removed because there is a lot of them

NUM_CLASSES = 38
IMG_SIZE = 150

x = []
y = []

def train_data_gen(DIR, ID):
    for img in os.listdir(DIR):
        try:
            path = DIR + '/' + img
            img = plt.imread(path)
            img = cv2.resize(img, (IMG_SIZE,IMG_SIZE))
            if img.shape == (IMG_SIZE, IMG_SIZE, 3):
                x.append(img)
                y.append(ID)
        except:
            None
#--
for DIR in os.listdir('../input/plantvillage-dataset/color/'):
    train_data_gen('../input/plantvillage-dataset/color/' + DIR, DIR)
    print(DIR)
#
print('reached label encoder')
le = LabelEncoder()
y = le.fit_transform(y)
x = np.array(x)
y = to_categorical(y, NUM_CLASSES)

print('data split')
x_train,x_test,y_train,y_test = train_test_split(x, y, test_size = 0.15)
x_train,x_val,y_train,y_val = train_test_split(x_train, y_train, test_size = 0.15)

print('datagen')
datagen = ImageDataGenerator(
    featurewise_center=False,
    samplewise_center=False,
    samplewise_std_normalization=False,
    rotation_range=60,
    zoom_range = 0.1,
    width_shift_range=0.1,
    height_shift_range=0.1,
    shear_range=0.1,
    fill_mode = "reflect"
)
print('datagen_fit')
datagen.fit(x_train)

print('model')
model = Sequential()
model.add(Conv2D(64, kernel_size=(3, 3), strides=2, padding='Same', activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(BatchNormalization())
#model.add(Dropout(0.2))
model.add(Conv2D(128, kernel_size=(3, 3), strides=2, padding='Same', activation='relu'))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(BatchNormalization())
#model.add(Dropout(0.3))
model.add(Conv2D(128, kernel_size=(3, 3), strides=2, padding='Same', activation='relu'))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(BatchNormalization())
#model.add(Dropout(0.3))
model.add(Conv2D(128, kernel_size=(3, 3), strides=2, padding='Same', activation='relu'))
#model.add(MaxPooling2D(pool_size=(2,2),strides=(2,2)))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
#model.add(Dropout(0.5))
model.add(BatchNormalization())
model.add(Dense(NUM_CLASSES, activation='softmax'))

print('Model compile')
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

print('Model Fit')
model.fit_generator(datagen.flow(x_train,y_train,batch_size=32), epochs=75, steps_per_epoch=x_train.shape[0]//32, validation_data=(x_val, y_val), verbose=1)

model.save('plantus_model')

我已經在我的代碼中放置了打印函數來查看實際問題所在。 當我適合datagen時，它停止的部分是正確的。 我不認為是一件事消耗了這么多內存，而是它之前的所有東西。 如何減少 RAM 使用量，以便真正開始訓練我的模型？ 預先感謝您提供答案和建設性反饋。

Answer 1

您正在使用 datagen.fit。 僅當您將任何參數featurewise_center 、 samplewise_center 、 featurewise_std_normalization 、 samplewise_std_normalization或zca_whitening為True samplewise_std_normalization zca_whitening 。 由於您沒有這樣做，因此您不需要擬合數據集。 這應該可以避免您使用過多的內存。

如何為大型圖像數據集使用更少的內存？（Python - Keras）

問題描述

1 個解決方案

解決方案1
1 已采納 2020-10-04 16:16:32

如何為大型圖像數據集使用更少的內存？ （Python - Keras）

問題描述

1 個解決方案

解決方案1 1 已采納 2020-10-04 16:16:32

如何為大型圖像數據集使用更少的內存？（Python - Keras）

解決方案1
1 已采納 2020-10-04 16:16:32