简体   繁体   English

keras fit_generator 时代的问题

[英]Issue with keras fit_generator epoch

I'm creating an LSTM Model for Text generation using Keras.我正在创建一个 LSTM Model 用于使用 Keras 生成文本。 As the dataset(around 25 novels,which has around 1.4 million words) I'm using can't be processed at once(An Memory issue with converting my outputs to_Categorical()) I created a custom generator function to read the Data in.作为数据集(大约 25 部小说,大约 140 万字)我使用的数据集不能一次处理(一个 Memory 问题,将我的输出转换为_Categorical())我创建了一个自定义生成器 function 来读取数据。

# Data generator for fit and evaluate
def generator(batch_size):
    start = 0
    end = batch_size
    while True:
      x = sequences[start:end,:-1]
      #print(x)
      y = sequences[start:end,-1]
      y = to_categorical(y, num_classes=vocab_size)
      #print(y)
      yield x, y
      if batch_size == len(lines):
        break;
      else:
        start += batch_size
        end += batch_size

when i excecute the model.fit() method, after 1 epoch is done training the following error is thrown.当我执行 model.fit() 方法时,在 1 个 epoch 完成训练后会引发以下错误。

UnknownError:  [_Derived_]  CUDNN_STATUS_BAD_PARAM
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1459): 'cudnnSetTensorNdDescriptor( tensor_desc.get(), data_type, sizeof(dims) / sizeof(dims[0]), dims, strides)'
     [[{{node CudnnRNN}}]]
     [[sequential/lstm/StatefulPartitionedCall]] [Op:__inference_train_function_25138]

Function call stack:
train_function -> train_function -> train_function

does anyone know how to solve this issue?有谁知道如何解决这个问题? Thanks谢谢

From many sources in the Internet, this issue seems to occur while using LSTM Layer along with Masking Layer and while training on GPU .从互联网上的许多来源来看,在使用LSTM LayerMasking Layer以及在GPU上进行训练时,似乎会出现此问题。

Mentioned below can be the workarounds for this problem:下面提到的可能是此问题的解决方法:

  1. If you can compromise on speed, you can Train your Model on CPU rather than on GPU .如果你可以在速度上妥协,你可以在CPU上训练你的Model而不是GPU It works without any error.它可以正常工作,没有任何错误。

  2. As per this comment , please check if your Input Sequences comprises of all Zeros, as the Masking Layer may mask all the Inputs根据此评论,请检查您的Input Sequences是否包含所有零,因为Masking Layer可能会屏蔽所有Inputs

  3. If possible, you can Disable the Eager Execution .如果可能,您可以禁用Eager Execution As per this comment , it works without any error.根据此评论,它可以正常工作。

  4. Instead of using a Masking Layer, you can try the alternatives mentioned in this link您可以尝试使用此链接中提到的替代方法,而不是使用遮罩层

    a.一个。 Adding the argument, mask_zero = True to the Embedding Layer .将参数mask_zero = True添加到Embedding Layer or或者

    b.湾。 Pass a mask argument manually when calling layers that support this argument调用支持此参数的图层时手动传递掩码参数

  5. Last solution can be to remove Masking Layer , if that is possible.如果可能的话,最后的解决方案是删除Masking Layer

If none of the above workaround solves your problem, Google Tensorflow Team is working to resolve this error.如果上述解决方法都不能解决您的问题,Google Tensorflow 团队正在努力解决此错误。 We may have to wait till that is fixed.我们可能要等到这个问题解决了。

Hope this information helps.希望这些信息有所帮助。 Happy Learning!快乐学习!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM