[英]Issue with keras fit_generator epoch
I'm creating an LSTM Model for Text generation using Keras.我正在创建一个 LSTM Model 用于使用 Keras 生成文本。 As the dataset(around 25 novels,which has around 1.4 million words) I'm using can't be processed at once(An Memory issue with converting my outputs to_Categorical()) I created a custom generator function to read the Data in.
作为数据集(大约 25 部小说,大约 140 万字)我使用的数据集不能一次处理(一个 Memory 问题,将我的输出转换为_Categorical())我创建了一个自定义生成器 function 来读取数据。
# Data generator for fit and evaluate
def generator(batch_size):
start = 0
end = batch_size
while True:
x = sequences[start:end,:-1]
#print(x)
y = sequences[start:end,-1]
y = to_categorical(y, num_classes=vocab_size)
#print(y)
yield x, y
if batch_size == len(lines):
break;
else:
start += batch_size
end += batch_size
when i excecute the model.fit() method, after 1 epoch is done training the following error is thrown.当我执行 model.fit() 方法时,在 1 个 epoch 完成训练后会引发以下错误。
UnknownError: [_Derived_] CUDNN_STATUS_BAD_PARAM
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1459): 'cudnnSetTensorNdDescriptor( tensor_desc.get(), data_type, sizeof(dims) / sizeof(dims[0]), dims, strides)'
[[{{node CudnnRNN}}]]
[[sequential/lstm/StatefulPartitionedCall]] [Op:__inference_train_function_25138]
Function call stack:
train_function -> train_function -> train_function
does anyone know how to solve this issue?有谁知道如何解决这个问题? Thanks
谢谢
From many sources in the Internet, this issue seems to occur while using LSTM Layer
along with Masking Layer
and while training on GPU
.从互联网上的许多来源来看,在使用
LSTM Layer
和Masking Layer
以及在GPU
上进行训练时,似乎会出现此问题。
Mentioned below can be the workarounds for this problem:下面提到的可能是此问题的解决方法:
If you can compromise on speed, you can Train your Model
on CPU
rather than on GPU
.如果你可以在速度上妥协,你可以在
CPU
上训练你的Model
而不是GPU
。 It works without any error.它可以正常工作,没有任何错误。
As per this comment , please check if your Input Sequences
comprises of all Zeros, as the Masking Layer
may mask all the Inputs
根据此评论,请检查您的
Input Sequences
是否包含所有零,因为Masking Layer
可能会屏蔽所有Inputs
If possible, you can Disable the Eager Execution
.如果可能,您可以禁用
Eager Execution
。 As per this comment , it works without any error.根据此评论,它可以正常工作。
Instead of using a Masking Layer, you can try the alternatives mentioned in this link您可以尝试使用此链接中提到的替代方法,而不是使用遮罩层
a.一个。 Adding the argument,
mask_zero = True
to the Embedding Layer
.将参数
mask_zero = True
添加到Embedding Layer
。 or或者
b.湾。 Pass a mask argument manually when calling layers that support this argument
调用支持此参数的图层时手动传递掩码参数
Last solution can be to remove Masking Layer
, if that is possible.如果可能的话,最后的解决方案是删除
Masking Layer
。
If none of the above workaround solves your problem, Google Tensorflow Team is working to resolve this error.如果上述解决方法都不能解决您的问题,Google Tensorflow 团队正在努力解决此错误。 We may have to wait till that is fixed.
我们可能要等到这个问题解决了。
Hope this information helps.希望这些信息有所帮助。 Happy Learning!
快乐学习!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.