简体   繁体   English

为什么每次我在这个特定的数据集上运行训练测试拆分时,我的 kernel 都会死掉?

[英]Why does my kernel die every time I run train-test split on this particular dataset?

I've used train-test split before and haven't had any issues.我以前使用过训练测试拆分并且没有任何问题。 I have a rather large (1GB) dataset for my CNN and tried using it, and my kernel dies every time.我的 CNN 有一个相当大的 (1GB) 数据集并尝试使用它,但我的 kernel 每次都死机。 I've read that sometimes it helps to enter shuffle=False .我读过有时输入shuffle=False会有所帮助。 I tried that with no luck.我试过了,但没有运气。 I've included my code below.我在下面包含了我的代码。 Any help would be appreciated!!任何帮助,将不胜感激!!

import pandas as pd
import os
import cv2
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from PIL import Image
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from sklearn.metrics import accuracy_score
np.random.seed(42)
data_dir='birds/'
train_path=data_dir+'/train'
test_path=data_dir+'/test'
img_size=(100,100)
channels=3
num_categories=len(os.listdir(train_path))
#get list of each category to zip
names_of_species=[]

for i in os.listdir(train_path):
    names_of_species.append(i)

#make list of numbers from 1-300:
num_list=[]
for i in range(300):
    num_list.append(i)
nums_and_names=dict(zip(num_list, names_of_species))
folders=os.listdir(train_path)
import random
from matplotlib.image import imread
df=pd.read_csv(data_dir+'/Bird_Species.csv')

img_data=[]
img_labels=[]

for i in nums_and_names:
    path=data_dir+'train/'+str(names_of_species[i])
    images=os.listdir(path)
    
    for img in images:
        try:
            image=cv2.imread(path+'/'+img)
            image_fromarray=Image.fromarray(image, 'RGB')
            resize_image=image_fromarray.resize((img_size))
            img_data.append(np.array(resize_image))
            img_labels.append(num_list[i])
        except:
            print("Error in "+img)
img_data=np.array(img_data)
img_labels=np.array(img_labels)
img_labels
array([210,  41, 148, ...,  15, 115, 292])
#SHUFFLE TRAINING DATA

shuffle_indices=np.arange(img_data.shape[0])
np.random.shuffle(shuffle_indices)
img_data=img_data[shuffle_indices]
img_labels=img_labels[shuffle_indices]
#Split the data

X_train, X_test, y_train, y_test=train_test_split(img_data,img_labels, test_size=0.2,random_state=42, shuffle=False)

#Resize data
X_train=X_train/255
X_val=X_val/255

This means that you are probably running out of RAM or GPU memory.这意味着您可能用完了 RAM 或 GPU memory。

To check on Windows open Task Manager (ctrl+shft+esc), go to performance run the code, and check the RAM usage and GPU memory usage to determine if the cause was either of them. To check on Windows open Task Manager (ctrl+shft+esc), go to performance run the code, and check the RAM usage and GPU memory usage to determine if the cause was either of them.

Note: To monitor GPU memory you should monitor "Dedicated GPU Memory", which can be found on the bottom left when you click on GPU.注意:要监控 GPU memory,您应该监控“专用 GPU 内存”,当您单击 Z52573329ECCDA373 时,可以在左下方找到

Adding to MK answer, if the cause of your kernel crash is indeed due to RAM/GPU limit.添加到 MK 答案,如果您的 kernel 崩溃的原因确实是由于 RAM/GPU 限制。 You could try to load your data in batches.您可以尝试分批加载数据。 Instead of splitting the entire datasets at the same time, try to divide maybe a quarter at a time.与其同时拆分整个数据集,不如尝试一次拆分四分之一。

Notice that after splitting the data you are basically keeping 2 instances of the same data (the original (img_data, img_labels) and in split form).请注意,拆分数据后,您基本上保留了相同数据的 2 个实例(原始(img_data, img_labels)和拆分形式)。 If you are running out of memory, the best is to manage it via an index array from which you implicitly pull batches as you need them.如果您的 memory 用完了,最好的办法是通过一个索引数组来管理它,您可以根据需要隐式地从中提取批次。

Create shuffled array of indices,创建洗牌的索引数组,

shuffle_indices = np.random.permutation(img_data.shape[0])

which does the same as your two lines in one step.这与一步中的两条线相同。

Split the indices corresponding to points in the train and test sets:拆分对应于训练和测试集中点的索引:

train_indices, test_indices = train_test_split(shuffle_indices, test_size=0.2,random_state=42, shuffle=False))

Then, iterate on batches,然后,迭代批次,

n_train = len(train_indices)
for epoch on range(n_epochs):
    # further shuffle the training data for each iteration, if desired
    epoch_shuffle = np.random.permutation(n_train)

    for i in range(n_train, step=batch_size):
        # get data batches
        x_batch = img_data[train_indices[epoch_shuffle[i*batch_size : (i+1)*batch_size]]]
        y_batch = img_labels[train_indices[epoch_shuffle[i*batch_size : (i+1)*batch_size]]]

        # train model
        ... 

I had the same problem when I used我使用的时候遇到了同样的问题

from sklearnex import patch_sklearn patch_sklearn()从 sklearnex 导入 patch_sklearn patch_sklearn()

It would always crash at random points in the code especially after a train_test_split.它总是会在代码中的随机点崩溃,尤其是在 train_test_split 之后。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM