简体   繁体   中英

Keras code not working in Jupyter: “The kernel appears to have died. It will restart automatically.”

I am writing code in Keras for a simple deep learning based 30x30 cat image classifier. When I get to the portion of my code that is supposed to train the model, Jupyter stops running and gives the error message "The kernel appears to have died. It will restart automatically." I do not know what is causing this to happen. If I look in terminal I'm getting a CUDA_ERROR_OUT_OF_MEMORY: out of memory; total memory reported: 11520114688. I would think that a simple classifier would not be exhausting the resources of my PC. I have a RTX 2080TI, 32gb ram, i9-9900k.

I don't know if it's a compatibility issue with software or what. But I do know that tensorflow-gpu is working because in my console it says so. The code I'm using is essentially verbatim from a deep learning in Keras book. The code ran fine on my 6 year old laptop, although trained very slow.

```from keras import layers
```from keras import models
-------------
```model = models.Sequential()
```model.add(layers.Conv2D(32, (3,3), activation='relu', input_shape=(150,150,3)))
```model.add(layers.MaxPooling2D((2,2)))
```model.add(layers.Conv2D(64, (3,3), activation='relu'))
```model.add(layers.MaxPooling2D((2,2)))
```model.add(layers.Conv2D(128, (3,3), activation='relu'))
```model.add(layers.MaxPooling2D((2,2)))
```model.add(layers.Conv2D(128, (3,3), activation='relu'))
```model.add(layers.MaxPooling2D((2,2)))
```model.add(layers.Flatten())
```model.add(layers.Dropout(0.5))
```model.add(layers.Dense(512, activation='relu'))
```model.add(layers.Dense(1,activation='sigmoid'))

After running this block above: WARNING:tensorflow:From /home/name/venv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /home/name/venv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating:

Please use rate instead of keep_prob . Rate should be set to rate = 1 - keep_prob .

```from keras import optimizers
```model.compile(loss='binary_crossentropy', optimizer=optimizers.RMSprop(lr=1e-4), metrics=['acc'])
```from keras.preprocessing.image import ImageDataGenerator
```train_datagen = ImageDataGenerator(rescale=1./255,
                                   rotation_range=40,
                                   width_shift_range=0.2,
                                  height_shift_range=0.2,
                                  shear_range=0.2,
                                  zoom_range=0.2,
                                  horizontal_flip=True)
```test_datagen = ImageDataGenerator(rescale=1./255)

```import os, shutil
```base_dir = '/home/name/Desktop/Deep Learning/Cat-Dog ```exercise/cats_and_dogs_small'
```train_dir = os.path.join(base_dir, 'train')
```validation_dir = os.path.join(base_dir, 'validation')
```test_dir = os.path.join(base_dir, 'test')

```train_generator = train_datagen.flow_from_directory(train_dir,
                                                   target_size=(150,150),
                                                   batch_size=10,
                                                   class_mode='binary')
```validation_generator = test_datagen.flow_from_directory(validation_dir,
                                                       target_size = (150,150),
                                                       batch_size=10,
                                                       class_mode='binary')
```history = model.fit_generator(train_generator,
                             steps_per_epoch=10,
                             epochs=100,
                             validation_data=validation_generator,
                             validation_steps=50)
```model.save('cats_and_dogs_small_2.h5')

After I run this, this is the output:

Found 2000 images belonging to 2 classes. Found 1000 images belonging to 2 classes. WARNING:tensorflow:From /home/name/venv/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Epoch 1/100 -- it then crashes here and I get the jupyter error.

Additional copy-paste from terminal:

[W 19:44:35.199 NotebookApp] Notebook Desktop/Deep Learning/TF test.ipynb is not trusted
[I 19:44:35.307 NotebookApp] Kernel started: ae9c0530-bdbb-4748-a8b1-4e9a98fad3b8
[I 19:44:35.669 NotebookApp] Adapting to protocol v5.1 for kernel ae9c0530-bdbb-4748-a8b1-4e9a98fad3b8
2019-11-11 19:44:38.840439: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-11 19:44:38.956151: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-11 19:44:38.957106: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x28f53e0 executing computations on platform CUDA. Devices:
2019-11-11 19:44:38.957121: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2019-11-11 19:44:38.984940: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2019-11-11 19:44:38.985600: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2953090 executing computations on platform Host. Devices:
2019-11-11 19:44:38.985611: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-11-11 19:44:38.986045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635
pciBusID: 0000:01:00.0
totalMemory: 10.73GiB freeMemory: 10.18GiB
2019-11-11 19:44:38.986059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-11 19:44:38.987129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-11 19:44:38.987140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-11-11 19:44:38.987144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-11-11 19:44:38.987375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 9903 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
[I 19:44:43.456 NotebookApp] Saving file at /Desktop/Deep Learning/TF test.ipynb
[I 19:44:43.565 NotebookApp] Starting buffering for ae9c0530-bdbb-4748-a8b1-4e9a98fad3b8:a4ca004462174782820be082f11422c7
[W 19:44:46.645 NotebookApp] Notebook Desktop/Deep Learning/5-2 Cat,Dog.ipynb is not trusted
[I 19:44:46.838 NotebookApp] Kernel started: 38279d87-721e-44c2-a40e-48cd4c7ba1c4
[I 19:44:47.142 NotebookApp] Adapting to protocol v5.1 for kernel 38279d87-721e-44c2-a40e-48cd4c7ba1c4
2019-11-11 19:45:18.007748: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-11 19:45:18.095869: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-11 19:45:18.096371: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x37ab830 executing computations on platform CUDA. Devices:
2019-11-11 19:45:18.096384: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2019-11-11 19:45:18.116837: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2019-11-11 19:45:18.117825: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2b14f90 executing computations on platform Host. Devices:
2019-11-11 19:45:18.117837: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-11-11 19:45:18.118228: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635
pciBusID: 0000:01:00.0
totalMemory: 10.73GiB freeMemory: 10.03GiB
2019-11-11 19:45:18.118258: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-11 19:45:18.118879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-11 19:45:18.118886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-11-11 19:45:18.118907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-11-11 19:45:18.119131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9756 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-11-11 19:45:19.008213: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
[I 19:46:32.074 NotebookApp] Kernel interrupted: 38279d87-721e-44c2-a40e-48cd4c7ba1c4
[I 19:46:36.231 NotebookApp] Saving file at /Desktop/Deep Learning/5-2 Cat,Dog.ipynb
[W 19:46:36.232 NotebookApp] Notebook Desktop/Deep Learning/5-2 Cat,Dog.ipynb is not trusted
[I 19:46:36.348 NotebookApp] Starting buffering for 38279d87-721e-44c2-a40e-48cd4c7ba1c4:de4e1e737fef45e0a9393696f2ee53b4
[I 19:46:39.645 NotebookApp] Kernel started: 6192e4fe-6840-45b0-9f26-ffec2efcd443
[I 19:46:39.902 NotebookApp] Adapting to protocol v5.1 for kernel 6192e4fe-6840-45b0-9f26-ffec2efcd443
2019-11-11 19:46:53.523436: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-11 19:46:53.634301: W tensorflow/compiler/xla/service/platform_util.cc:240] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY: out of memory; total memory reported: 11520114688
2019-11-11 19:46:53.634372: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: no supported devices found for platform CUDA
[I 19:46:54.645 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
WARNING:root:kernel 6192e4fe-6840-45b0-9f26-ffec2efcd443 restarted
[I 19:48:39.650 NotebookApp] Saving file at /Desktop/Deep Learning/5-2 cats,dogs w regularization.ipynb
2019-11-11 19:49:31.150869: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-11 19:49:31.223933: W tensorflow/compiler/xla/service/platform_util.cc:240] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY: out of memory; total memory reported: 11520114688
2019-11-11 19:49:31.224025: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: no supported devices found for platform CUDA
[I 19:49:33.653 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
WARNING:root:kernel 6192e4fe-6840-45b0-9f26-ffec2efcd443 restarted
2019-11-11 19:49:51.505573: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-11 19:49:51.579785: W tensorflow/compiler/xla/service/platform_util.cc:240] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY: out of memory; total memory reported: 11520114688
2019-11-11 19:49:51.579900: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: no supported devices found for platform CUDA
[I 19:49:51.665 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
WARNING:root:kernel 6192e4fe-6840-45b0-9f26-ffec2efcd443 restarted
[I 19:50:39.647 NotebookApp] Saving file at /Desktop/Deep Learning/5-2 cats,dogs w regularization.ipynb
[E 19:58:52.375 NotebookApp] nbconvert failed: No template_file specified!
    Traceback (most recent call last):
      File "/home/name/venv/lib/python3.6/site-packages/notebook/nbconvert/handlers.py", line 130, in get
        resources=resource_dict
      File "/home/name/venv/lib/python3.6/site-packages/nbconvert/exporters/templateexporter.py", line 315, in from_notebook_node
        output = self.template.render(nb=nb_copy, resources=resources)
      File "/home/name/venv/lib/python3.6/site-packages/nbconvert/exporters/templateexporter.py", line 113, in template
        self._template_cached = self._load_template()
      File "/home/name/venv/lib/python3.6/site-packages/nbconvert/exporters/templateexporter.py", line 278, in _load_template
        raise ValueError("No template_file specified!")
    ValueError: No template_file specified!
[W 19:58:52.376 NotebookApp] 500 GET /nbconvert/custom/Desktop/Deep%20Learning/5-2%20cats%2Cdogs%20w%20regularization.ipynb?download=true (127.0.0.1): nbconvert failed: No template_file specified!
[E 19:58:52.379 NotebookApp] {
      "Host": "localhost:8888",
      "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:70.0) Gecko/20100101 Firefox/70.0",
      "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
      "Accept-Language": "en-US,en;q=0.5",
      "Accept-Encoding": "gzip, deflate",
      "Connection": "keep-alive",
      "Referer": "http://localhost:8888/notebooks/Desktop/Deep%20Learning/5-2%20cats%2Cdogs%20w%20regularization.ipynb",
      "Cookie": "_xsrf=2|79acfc45|e59d0dfc567c5357dd03be1910afbf02|1572817491; username-localhost-8888=\"2|1:0|10:1573523050|23:username-localhost-8888|44:N2U5M2FmY2QzYTE1NDBhZDg4NGY3N2U2MWU4MjYwOTU=|9895ebebc839d5dc7ba5a25aa7fb16775d32c546b63da08b9acd1c060bac0dde\"",
      "Upgrade-Insecure-Requests": "1"

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 208... Off | 00000000:01:00.0 On | N/A | | 0% 52C P8 34W / 260W | 428MiB / 10986MiB | 2% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1253 G /usr/lib/xorg/Xorg 26MiB | | 0 1364 G /usr/bin/gnome-shell 58MiB | | 0 3027 G /usr/lib/xorg/Xorg 188MiB | | 0 3164 G /usr/bin/gnome-shell 139MiB | | 0 7687 G /usr/lib/firefox/firefox 6MiB | | 0 8085 G /usr/lib/firefox/firefox 6MiB | +-----------------------------------------------------------------------------+

Package Version


absl-py 0.7.0
astor 0.7.1
attrs 18.2.0
backcall 0.1.0
bleach 3.1.0
cycler 0.10.0
decorator 4.3.2
defusedxml 0.5.0
entrypoints 0.3
gast 0.2.2
grpcio 1.19.0
h5py 2.9.0
ipykernel 5.1.0
ipython 7.3.0
ipython-genutils 0.2.0
ipywidgets 7.4.2
jedi 0.13.3
Jinja2 2.10
jsonschema 3.0.0
jupyter 1.0.0
jupyter-client 5.2.4
jupyter-console 6.0.0
jupyter-core 4.4.0
Keras 2.2.4
Keras-Applications 1.0.7
Keras-Preprocessing 1.0.9
kiwisolver 1.0.1
Markdown 3.0.1
MarkupSafe 1.1.1
matplotlib 3.0.2
mistune 0.8.4
mock 2.0.0
nbconvert 5.4.1
nbformat 4.4.0
notebook 5.7.4
numpy 1.16.2
pandas 0.24.1
pandocfilters 1.4.2
parso 0.3.4
pbr 5.1.3
pexpect 4.6.0
pickleshare 0.7.5
Pillow 5.4.1
pip 19.3.1
pkg-resources 0.0.0
prometheus-client 0.6.0
prompt-toolkit 2.0.9
protobuf 3.7.0
ptyprocess 0.6.0
Pygments 2.3.1
pyparsing 2.3.1
pyrsistent 0.14.11
python-dateutil 2.8.0
pytz 2018.9
PyYAML 3.13
pyzmq 18.0.0
qtconsole 4.4.3
scikit-learn 0.20.2
scipy 1.2.1
Send2Trash 1.5.0
setuptools 40.8.0
six 1.12.0
tensorboard 1.13.0
tensorflow-estimator 1.13.0
tensorflow-gpu 1.13.1
termcolor 1.1.0
terminado 0.8.1
testpath 0.4.2
torch 1.0.1.post2 torchvision 0.2.2.post2 tornado 5.1.1
traitlets 4.3.2
wcwidth 0.1.7
webencodings 0.5.1
Werkzeug 0.14.1
wheel 0.33.1
widgetsnbextension 3.4.2

target_size: tuple of integers (height, width), default: (256, 256). The dimensions to which all images found will be resized. Your image are 30x30 but the target size of your images are being resized to 150x150. taking a chunk of additional memory. You can tell that your data exceed the available memory through this line totalMemory: 10.73GiB freeMemory: 10.03GiB I would suggest you to reduce the batch_size (ie: use 4 images and increases until you get this error again).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM