简体   繁体   English

Python 内核在带有 tensorflow 2 的 Jupyter Notebook 上死机

[英]Python kernel dies on Jupyter Notebook with tensorflow 2

I installed tensorflow 2 on my mac using conda accordingthese instructions:我根据以下说明使用 conda 在我的 mac 上安装了 tensorflow 2:

conda create -n tf2 tensorflow

Then I installed ipykernel to add this new environment to my jupyter notebook kernels as follows:然后我安装了 ipykernel 以将这个新环境添加到我的 jupyter notebook 内核中,如下所示:

conda activate tf2
conda install ipykernel
python -m ipykernel install --user --name=tf2

That seemed to work well, I am able to see my tf2 environment on my jupyter notebook kernels.这似乎运作良好,我可以在我的 jupyter notebook 内核上看到我的tf2环境。

Then I tried to run the simple MNIST example to check if all was working properly and I when I execute this line of code:然后我尝试运行简单的 MNIST示例来检查是否一切正常,当我执行这行代码时:

model.fit(x_train, y_train, epochs=5)

The kernel of my jupyter notebook dies without more information.我的 jupyter notebook 的内核在没有更多信息的情况下死掉了。

死内核

I executed the same code on my terminal via python mnist_test.py and also via ipython (command by command) and I don't have any issues, which let's me assume that my tensorflow 2 is correctly installed on my conda environment.我通过python mnist_test.pyipython (通过命令命令)在我的终端上执行了相同的代码,我没有任何问题,让我假设我的 tensorflow 2 已正确安装在我的 conda 环境中。

Any ideas on what went wrong during the install?关于安装过程中出了什么问题的任何想法?

Versions:版本:

python==3.7.5
tensorboard==2.0.0
tensorflow==2.0.0
tensorflow-estimator==2.0.0
ipykernel==5.1.3
ipython==7.10.2
jupyter==1.0.0
jupyter-client==5.3.4
jupyter-console==5.2.0
jupyter-core==4.6.1

Here I put the complete script as well as the STDOUT of the execution:这里我放了完整的脚本以及执行的 STDOUT:

import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sns

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train, x_test = x_train / 255.0, x_test / 255.0

nn_model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

nn_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

nn_model.fit(x_train, y_train, epochs=5)

nn_model.evaluate(x_test,  y_test, verbose=2)

(tf2) ➜ tensorflow2 python mnist_test.py 2020-01-03 10:46:10.854619: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags. (tf2) ➜ tensorflow2 python mnist_test.py 2020-01-03 10:46:10.854619: I tensorflow/core/platform/cpu_feature_guard.cc:145] 这个 TensorFlow 二进制文件使用 Intel(R) MKL-DNN 进行了优化,以使用以下内容性能关键操作中的 CPU 指令:SSE4.1 SSE4.2 AVX AVX2 FMA 要在非 MKL-DNN 操作中启用它们,请使用适当的编译器标志重建 TensorFlow。 2020-01-03 10:46:10.854860: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 8. Tune using inter_op_parallelism_threads for best performance. 2020-01-03 10:46:10.854860: I tensorflow/core/common_runtime/process_util.cc:115] 使用默认互操作设置创建新线程池:8. 使用 inter_op_parallelism_threads 进行调整以获得最佳性能。 Train on 60000 samples Epoch 1/5 60000/60000 [==============================] - 6s 102us/sample - loss: 0.3018 - accuracy: 0.9140 Epoch 2/5 60000/60000 [==============================] - 6s 103us/sample - loss: 0.1437 - accuracy: 0.9571 Epoch 3/5 60000/60000 [==============================] - 6s 103us/sample - loss: 0.1054 - accuracy: 0.9679 Epoch 4/5 60000/60000 [==============================] - 6s 103us/sample - loss: 0.0868 - accuracy: 0.9729 Epoch 5/5 60000/60000 [==============================] - 6s 103us/sample - loss: 0.0739 - accuracy: 0.9772 10000/1 - 1s - loss: 0.0359 - accuracy: 0.9782 (tf2) ➜ tensorflow2训练 60000 个样本 Epoch 1/5 60000/60000 [=============================] - 6s 102us/样本 - 损失: 0.3018 - 准确度: 0.9140 Epoch 2/5 60000/60000 [==============================] - 6s 103us/sample - 损失:0.1437 - 准确度:0.9571 Epoch 3/5 60000/60000 [==============================] - 6s 103us /sample - 损失:0.1054 - 准确度:0.9679 Epoch 4/5 60000/60000 [==============================] - 6s 103us/sample - 损失:0.0868 - 准确率:0.9729 Epoch 5/5 60000/60000 [============================= ] - 6s 103us/sample - 损失:0.0739 - 准确度:0.9772 10000/1 - 1s - 损失:0.0359 - 准确度:0.9782 (tf2) ➜ tensorflow2

After trying different things I run jupyter notebook on debug mode by using the command:在尝试了不同的事情后,我使用以下命令在调试模式下运行 jupyter notebook:

jupyter notebook --debug

Then after executing the commands on my notebook I got the error message:然后在我的笔记本上执行命令后,我收到了错误消息:

 OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, eg by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

And following this discussion, installing nomkl on the virtual environment worked for me.这个讨论之后,在虚拟环境上安装 nomkl 对我有用。

conda install nomkl

I can't exactly guess the problem you are having but looks like it has do with some version clash.我无法完全猜出您遇到的问题,但看起来它与某些版本冲突有关。 Do the following (that's what I did and it works for me):执行以下操作(这就是我所做的,它对我有用):

  1. conda create -n tf2 python=3.7 ipython ipykernel
  2. conda activate tf2
  3. conda install -c anaconda tensorflow
  4. python -m ipykernel install --user --name=tf2
  5. Run the model again and see if it is working.再次运行模型,看看它是否正常工作。

Try conda install nomkl .尝试conda install nomkl Even if you face the problem , Check your anaconda/lib folder , run ll lib*omp* , do you see some old libiomp5.dylib file?即使您遇到问题,请检查您的anaconda/lib folder ,运行ll lib*omp* ,您是否看到一些旧的libiomp5.dylib文件? Remove it.去掉它。

For me this issue was happening, as show below near to red arrow After dubugging in jupyter, I realised this issue happens when its streaming serializaed data from tensorboard directory.对我来说,这个问题正在发生,如下图红色箭头附近所示 Now If I change the model_dir="someothername" then I works like charm.现在,如果我更改 model_dir="someothername" 那么我的工作就像魅力一样。 在此处输入图像描述

Installing nomkl fixed it for me.安装 nomkl 为我修复了它。

Try conda install nomkl or install from environments in anaconda navigator .尝试conda install nomkl或从anaconda navigator中的环境安装。

Tensorflow GPU won't support for versions of 12.0 and higher, use Tensorflow GPU 不支持 12.0 及更高版本,请使用

import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM