简体   繁体   English

keras中的小型LSTM模型不适合我的GPU

[英]Small LSTM model in keras does not fit my GPU

I am programming a relatively small LSTM model in Google Collab. 我正在用Google Collab编程一个相对较小的LSTM模型。

For reference I am using TensorFlow 1.13 to build the model, using tensorflow.keras for the keras API. 作为参考,我使用TensorFlow 1.13来构建模型,并使用tensorflow.keras作为keras API。

seq_len = 20000; n_classes = 4
inputs = ll.Input(shape=(seq_len,))
x = ll.Embedding(len(word_index), 1000)(inputs)
x = ll.LSTM(units=100, activation='relu', return_sequences=True)(x)
outputs = ll.Dense(units = n_classes, activation='softmax')(x)
model = Model(inputs, outputs)
model.summary()

I have checked that I have 15 GB of GPU RAM available, and according to my estimations the model with a batch size of 32 should fit in 3GB of RAM. 我检查了是否有15 GB的GPU RAM可用,根据我的估计 ,批处理大小为32的模型应该适合3GB的RAM。

However, whenever I launch the training the server runs out of memory. 但是,每当我启动培训时,服务器内存就会用完。

To be fair, I am using extremely long sequences of data (20000 is the maximum sequence length) but I would expect the model to unroll symbolically in memory and just fit in. 公平地说,我正在使用非常长的数据序列(最大序列长度为20000),但是我希望模型在内存中象征性地展开并适合。

Reducing the batch size to 1 does not help either. 将批次大小减小为1也不会有帮助。

What is going on? 到底是怎么回事? How can I make this model fit in memory? 如何使该模型适合内存?

EDIT: I tried reducing the sequence length to 2 and that indeed makes it fit in memory. 编辑:我试图减少序列长度为2,这确实使其适合内存。 But I need the sequence length to remain high. 但是我需要序列长度保持较高。 How can I tell Tensorflow to not unroll the network at any point? 我怎样才能告诉Tensorflow在任何时候都不展开网络? (I suspect that is what is going on behind the scenes, how can I check if this is indeed the case?) (我怀疑这是幕后情况,如何检查是否确实如此?)

EDIT: If I remove the Softmax layer then the memory use drops to the normal range again. 编辑:如果我删除Softmax层,然后内存使用率再次下降到正常范围。 I think that the Softmax layer is causing Tensorflow to unroll the network. 我认为Softmax层导致Tensorflow展开网络。 TimeDistributing the Softmax does not help though. 但是,时间分配Softmax并没有帮助。

Changing the LSTM layer for the CuDNNLSTM layer did the trick! 将LSTM层更改为CuDNNLSTM层就成功了!

inputs = ll.Input(shape=(seq_len,))
x = ll.Embedding(len(word_index), 1024)(inputs)
x = ll.CuDNNLSTM(units=100, return_sequences=True)(x)
x = ll.Dense(units = n_classes, activation='softmax')(x)
outputs = x
model = Model(inputs, outputs)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM