為什么每次我在這個特定的數據集上運行訓練測試拆分時，我的 kernel 都會死掉？

Question

我以前使用過訓練測試拆分並且沒有任何問題。 我的 CNN 有一個相當大的 (1GB) 數據集並嘗試使用它，但我的 kernel 每次都死機。 我讀過有時輸入shuffle=False會有所幫助。 我試過了，但沒有運氣。 我在下面包含了我的代碼。 任何幫助，將不勝感激！！

import pandas as pd
import os
import cv2
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from PIL import Image
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from sklearn.metrics import accuracy_score
np.random.seed(42)
data_dir='birds/'
train_path=data_dir+'/train'
test_path=data_dir+'/test'
img_size=(100,100)
channels=3
num_categories=len(os.listdir(train_path))
#get list of each category to zip
names_of_species=[]

for i in os.listdir(train_path):
    names_of_species.append(i)

#make list of numbers from 1-300:
num_list=[]
for i in range(300):
    num_list.append(i)
nums_and_names=dict(zip(num_list, names_of_species))
folders=os.listdir(train_path)
import random
from matplotlib.image import imread
df=pd.read_csv(data_dir+'/Bird_Species.csv')

img_data=[]
img_labels=[]

for i in nums_and_names:
    path=data_dir+'train/'+str(names_of_species[i])
    images=os.listdir(path)
    
    for img in images:
        try:
            image=cv2.imread(path+'/'+img)
            image_fromarray=Image.fromarray(image, 'RGB')
            resize_image=image_fromarray.resize((img_size))
            img_data.append(np.array(resize_image))
            img_labels.append(num_list[i])
        except:
            print("Error in "+img)
img_data=np.array(img_data)
img_labels=np.array(img_labels)
img_labels
array([210,  41, 148, ...,  15, 115, 292])
#SHUFFLE TRAINING DATA

shuffle_indices=np.arange(img_data.shape[0])
np.random.shuffle(shuffle_indices)
img_data=img_data[shuffle_indices]
img_labels=img_labels[shuffle_indices]
#Split the data

X_train, X_test, y_train, y_test=train_test_split(img_data,img_labels, test_size=0.2,random_state=42, shuffle=False)

#Resize data
X_train=X_train/255
X_val=X_val/255

Answer 1

這意味着您可能用完了 RAM 或 GPU memory。

To check on Windows open Task Manager (ctrl+shft+esc), go to performance run the code, and check the RAM usage and GPU memory usage to determine if the cause was either of them.

注意：要監控 GPU memory，您應該監控“專用 GPU 內存”，當您單擊 Z52573329ECCDA373 時，可以在左下方找到

Answer 2

添加到 MK 答案，如果您的 kernel 崩潰的原因確實是由於 RAM/GPU 限制。 您可以嘗試分批加載數據。 與其同時拆分整個數據集，不如嘗試一次拆分四分之一。

Answer 3

請注意，拆分數據后，您基本上保留了相同數據的 2 個實例（原始(img_data, img_labels)和拆分形式）。 如果您的 memory 用完了，最好的辦法是通過一個索引數組來管理它，您可以根據需要隱式地從中提取批次。

創建洗牌的索引數組，

shuffle_indices = np.random.permutation(img_data.shape[0])

這與一步中的兩條線相同。

拆分對應於訓練和測試集中點的索引：

train_indices, test_indices = train_test_split(shuffle_indices, test_size=0.2,random_state=42, shuffle=False))

然后，迭代批次，

n_train = len(train_indices)
for epoch on range(n_epochs):
    # further shuffle the training data for each iteration, if desired
    epoch_shuffle = np.random.permutation(n_train)

    for i in range(n_train, step=batch_size):
        # get data batches
        x_batch = img_data[train_indices[epoch_shuffle[i*batch_size : (i+1)*batch_size]]]
        y_batch = img_labels[train_indices[epoch_shuffle[i*batch_size : (i+1)*batch_size]]]

        # train model
        ...

Answer 4

我使用的時候遇到了同樣的問題

從 sklearnex 導入 patch_sklearn patch_sklearn()

它總是會在代碼中的隨機點崩潰，尤其是在 train_test_split 之后。

為什么每次我在這個特定的數據集上運行訓練測試拆分時，我的 kernel 都會死掉？

問題描述

3 個解決方案

解決方案1
2 2021-09-30 16:41:04

解決方案2
0 2021-09-30 18:04:48

解決方案3
0 2021-09-30 19:08:40

解決方案4
0 2022-08-30 22:39:49

為什么每次我在這個特定的數據集上運行訓練測試拆分時，我的 kernel 都會死掉？

問題描述

3 個解決方案

解決方案1 2 2021-09-30 16:41:04

解決方案2 0 2021-09-30 18:04:48

解決方案3 0 2021-09-30 19:08:40

解決方案4 0 2022-08-30 22:39:49

解決方案1
2 2021-09-30 16:41:04

解決方案2
0 2021-09-30 18:04:48

解決方案3
0 2021-09-30 19:08:40

解決方案4
0 2022-08-30 22:39:49