[英]Keras with tensorflow-gpu totally freezes PC
I have pretty simple architecture lstm NN.我有非常简单的架构 lstm NN。 After few epoch 1-2 my PC totally freezes I can't even move my mouse :在几个 epoch 1-2 之后,我的电脑完全死机,我什至无法移动鼠标:
Layer (type) Output Shape Param #
=================================================================
lstm_4 (LSTM) (None, 128) 116224
_________________________________________________________________
dropout_3 (Dropout) (None, 128) 0
_________________________________________________________________
dense_5 (Dense) (None, 98) 12642
=================================================================
Total params: 128,866
Trainable params: 128,866
Non-trainable params: 0
# Same problem with 2 layers LSTM with dropout and Adam optimizer
SEQUENCE_LENGTH =3, len(chars) = 98
model = Sequential()
model.add(LSTM(128, input_shape = (SEQUENCE_LENGTH, len(chars))))
#model.add(Dropout(0.15))
#model.add(LSTM(128))
model.add(Dropout(0.10))
model.add(Dense(len(chars), activation = 'softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer = RMSprop(lr=0.01), metrics=['accuracy'])
This is how I train:我是这样训练的:
history = model.fit(X, y, validation_split=0.20, batch_size=128, epochs=10, shuffle=True,verbose=2).history
NN needs 5 minutes to finish 1 epoch. NN 需要 5 分钟才能完成 1 个 epoch。 Higher size of batch doesn't mean that problem will occur faster.更大的批量并不意味着问题会更快地发生。 But more complex model can train more time achieving almost same accuracy - about 0.46 (full code here )但是更复杂的模型可以训练更多的时间来达到几乎相同的准确度 - 大约 0.46(完整代码在这里)
I have last up to date Linux Mint, 1070ti with 8GB, 32Gb ram我有最新的 Linux Mint, 1070ti 8GB, 32Gb ram
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26 Driver Version: 396.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 107... Off | 00000000:08:00.0 On | N/A |
| 0% 35C P8 10W / 180W | 303MiB / 8116MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Libraries:图书馆:
Keras==2.2.0
Keras-Applications==1.0.2
Keras-Preprocessing==1.0.1
keras-sequential-ascii==0.1.1
keras-tqdm==2.0.1
tensorboard==1.8.0
tensorflow==1.0.1
tensorflow-gpu==1.8.0
I have tried limit GPU memory usage, but it can't be a problem here because during training it eats only 1 GB of gpu memory:我曾尝试限制 GPU 内存使用,但这里不会有问题,因为在训练期间它只吃 1 GB 的 GPU 内存:
from keras.backend.tensorflow_backend
import set_session config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.9
config.gpu_options.allow_growth = True set_session(tf.Session(config=config))
What is wrong here?这里有什么问题? How can I fix the problem?我该如何解决这个问题?
I had this exact problem.我有这个确切的问题。 The computer died after about 15 minutes of training.计算机在训练大约 15 分钟后就死机了。 I found that it was a memory SIMM card that died when it got warm / hot.我发现它是一个内存 SIMM 卡,当它变热/变热时就死了。 If you have more than one SIMM card, you can take one out at a time and see if it is the culprit.如果你有不止一张SIMM卡,你可以一次取出一张,看看是不是罪魁祸首。
tensorflow==1.0.1
first.请先移除tensorflow==1.0.1
cpu 版本。 Try installing the tensorflow-gpu==1.8.0
by building TensorFlow from sources as mentioned here尝试通过从此处提到的来源构建 TensorFlow 来安装tensorflow-gpu==1.8.0
or或者
LSTM
with CuDNNLSTM
while training model on GPU.在 GPU 上训练模型时用CuDNNLSTM
替换LSTM
。 Later load the trained model weights into same model architecture with LSTM layer to use the model on CPU.稍后将训练好的模型权重加载到与 LSTM 层相同的模型架构中,以在 CPU 上使用该模型。 (Make sure to use recurrent_activation='sigmoid'
in LSTM layer when re-loading CuDNNLSTM model weights!) (确保在重新加载 CuDNNLSTM 模型权重时在 LSTM 层中使用recurrent_activation='sigmoid'
!)This is some kind of weird for me but problem was related with my new just april 2018 released CPU from AMD.这对我来说有点奇怪,但问题与我 2018 年 4 月刚刚从 AMD 发布的新 CPU 相关。 So having up to date linux kernel was crucial: following this guide https://itsfoss.com/upgrade-linux-kernel-ubuntu/ I updated kernel from 4.13 to 4.17 - now everything works因此,拥有最新的 linux 内核至关重要:按照本指南https://itsfoss.com/upgrade-linux-kernel-ubuntu/我将内核从 4.13 更新到 4.17 - 现在一切正常
UPD: The motherboard was crashing the system as well, I have changed it - now everythings works well UPD:主板也让系统崩溃,我已经改变了 - 现在一切正常
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.