训练 LSTM 自动编码器时出错：“没有为任何变量提供梯度”

Question

I've been trying to create a recurrent autoencoder for anomaly detection, but I'm getting a strange error when trying to train the model.我一直在尝试创建一个用于异常检测的循环自动编码器，但是在尝试训练 model 时遇到了一个奇怪的错误。 Everything else seems to work fine – the error only occurs when training the model.其他一切似乎都运行良好——该错误仅在训练 model 时发生。

Things I've Tried:我尝试过的事情：

I'm pretty sure this isn't a problem with the data I'm feeding the model – I've checked and re-checked that it's in the proper shape, regularized properly, etc.我很确定这不是我提供给 model 的数据的问题——我已经检查并重新检查了它的形状是否正确，正则化等。

I tried getting data from a different set of files, and got the same error.我尝试从一组不同的文件中获取数据，并得到了同样的错误。

Replacing the model's layers with just LSTM(2, return_sequences=True, input_shape=(512,2)) gives a similar error.仅用LSTM(2, return_sequences=True, input_shape=(512,2))替换模型的层会产生类似的错误。

Same error when replacing layers with just TimeDistributed(Dense(2), input_shape=(512,2))仅用TimeDistributed(Dense(2), input_shape=(512,2))替换图层时出现同样的错误

Or with Dense(512, input_shape=(512,2))或使用Dense(512, input_shape=(512,2))

I specified the input shape for each layer, and they all seem to be exactly what I expected, so seemingly no issues there.我为每一层指定了输入形状，它们似乎都正是我所期望的，所以似乎没有问题。

The Complete Texts:全文：

Here is the error, preceded by the model summary and debug info from the read_CSV and regularize functions, which read and format data from CSV files and regularize that data respectively.这是错误，前面是 model 摘要和调试信息，来自 read_CSV 和正则化函数，它们从 CSV 文件读取和格式化数据并分别正则化该数据。

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
lstm (LSTM)                  (None, 512, 4)            112
_________________________________________________________________
conv1d (Conv1D)              (None, 128, 2)            34

_________________________________________________________________
lstm_1 (LSTM)                (None, 128, 4)            112
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 32, 2)             34
_________________________________________________________________
lstm_2 (LSTM)                (None, 32, 4)             112
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 8, 4)              68
_________________________________________________________________
lstm_3 (LSTM)                (None, 8, 4)              144
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 2, 8)              136
_________________________________________________________________
flatten (Flatten)            (None, 16)                0
_________________________________________________________________
repeat_vector (RepeatVector) (None, 32, 16)            0
_________________________________________________________________
lstm_4 (LSTM)                (None, 32, 4)             336
_________________________________________________________________
up_sampling1d (UpSampling1D) (None, 64, 4)             0
_________________________________________________________________
lstm_5 (LSTM)                (None, 64, 2)             56
_________________________________________________________________
up_sampling1d_1 (UpSampling1 (None, 128, 2)            0
_________________________________________________________________
lstm_6 (LSTM)                (None, 128, 2)            40
_________________________________________________________________
up_sampling1d_2 (UpSampling1 (None, 256, 2)            0
_________________________________________________________________
lstm_7 (LSTM)                (None, 256, 2)            40
_________________________________________________________________
up_sampling1d_3 (UpSampling1 (None, 512, 2)            0
_________________________________________________________________
lstm_8 (LSTM)                (None, 512, 2)            40
=================================================================
Total params: 1,264
Trainable params: 1,264
Non-trainable params: 0
_________________________________________________________________
None
read_CSV: Sequence length: 512
read_CSV: Reading from column 1 to column 2
read_CSV: (Column numbers start with 0, so the first column is column 0.)
read_CSV: Opening "data/train/1MT09_gen_14.dat.csv"
read_CSV: File "data/train/1MT09_gen_14.dat.csv" has 67478 lines
read_CSV: Building sequence 0
read_CSV: File "data/train/1MT09_gen_14.dat.csv" has 67478 lines remaining
read_CSV: Yielding sequence
pair_sequence
index  0
Max, min: 1.113525390625 -1.886474609375
Extreme: 1.886474609375
index  1
Max, min: 1.150146484375 -1.849853515625
Extreme: 1.849853515625
Epoch 1/10
Traceback (most recent call last):
  File "autoencoder.py", line 124, in <module>
    train_autoencoder()
  File "autoencoder.py", line 120, in train_autoencoder
    epochs=10
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensor
flow/python/keras/engine/training.py", line 66, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensor
flow/python/keras/engine/training.py", line 848, in fit
    tmp_logs = train_function(iterator)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensor
flow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensor
flow/python/eager/def_function.py", line 627, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensor
flow/python/eager/def_function.py", line 506, in _initialize
    *args, **kwds))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensor
flow/python/eager/function.py", line 2446, in _get_concrete_function_internal_garbage_collec
ted
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensor
flow/python/eager/function.py", line 2777, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensor
flow/python/eager/function.py", line 2667, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensor
flow/python/framework/func_graph.py", line 981, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensor
flow/python/eager/def_function.py", line 441, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensor
flow/python/framework/func_graph.py", line 968, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow
/python/keras/engine/training.py:571 train_function  *
        outputs = self.distribute_strategy.run(
    /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow
/python/distribute/distribute_lib.py:951 run  **
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow
/python/distribute/distribute_lib.py:2290 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow
/python/distribute/distribute_lib.py:2649 _call_for_each_replica
        return fn(*args, **kwargs)
    /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow
/python/keras/engine/training.py:541 train_step  **
        self.trainable_variables)
    /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow
/python/keras/engine/training.py:1804 _minimize
        trainable_variables))
    /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow
/python/keras/optimizer_v2/optimizer_v2.py:521 _aggregate_gradients
        filtered_grads_and_vars = _filter_grads(grads_and_vars)
    /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow
/python/keras/optimizer_v2/optimizer_v2.py:1219 _filter_grads
        ([v.name for _, v in grads_and_vars],))

    ValueError: No gradients provided for any variable: ['lstm/lstm_cell/kernel:0', 'lstm/ls
tm_cell/recurrent_kernel:0', 'lstm/lstm_cell/bias:0', 'conv1d/kernel:0', 'conv1d/bias:0', 'l
stm_1/lstm_cell_1/kernel:0', 'lstm_1/lstm_cell_1/recurrent_kernel:0', 'lstm_1/lstm_cell_1/bi
as:0', 'conv1d_1/kernel:0', 'conv1d_1/bias:0', 'lstm_2/lstm_cell_2/kernel:0', 'lstm_2/lstm_c
ell_2/recurrent_kernel:0', 'lstm_2/lstm_cell_2/bias:0', 'conv1d_2/kernel:0', 'conv1d_2/bias:
0', 'lstm_3/lstm_cell_3/kernel:0', 'lstm_3/lstm_cell_3/recurrent_kernel:0', 'lstm_3/lstm_cel
l_3/bias:0', 'conv1d_3/kernel:0', 'conv1d_3/bias:0', 'lstm_4/lstm_cell_4/kernel:0', 'lstm_4/
lstm_cell_4/recurrent_kernel:0', 'lstm_4/lstm_cell_4/bias:0', 'lstm_5/lstm_cell_5/kernel:0',
 'lstm_5/lstm_cell_5/recurrent_kernel:0', 'lstm_5/lstm_cell_5/bias:0', 'lstm_6/lstm_cell_6/k
ernel:0', 'lstm_6/lstm_cell_6/recurrent_kernel:0', 'lstm_6/lstm_cell_6/bias:0', 'lstm_7/lstm
_cell_7/kernel:0', 'lstm_7/lstm_cell_7/recurrent_kernel:0', 'lstm_7/lstm_cell_7/bias:0', 'ls
tm_8/lstm_cell_8/kernel:0', 'lstm_8/lstm_cell_8/recurrent_kernel:0', 'lstm_8/lstm_cell_8/bia
s:0'].

Here is the code:这是代码：

import tensorflow as tf

import numpy as np

from keras.models import Sequential
from keras.layers import Dense, LSTM, Input, Conv1D, Reshape, RepeatVector, Flatten, UpSampling1D, TimeDistributed

from keras.utils import plot_model

from glob import glob # Hmmm
import os

#> Get Data <#

def regularize(data):
    data_shape = np.array(data).shape
    result = []
    print("pair_sequence")
    sub_result = [[],[]]
    index_range = range(0, 2)
    for index in index_range:
        print("index ", index)
        sequence = [pair[index] for pair in data]
        mean = np.mean(sequence)

        sequence = [int(value) - mean for value in sequence]

        maximum = max(sequence)
        minimum = min(sequence)
        print("Max, min:", maximum, minimum)

        assert maximum > minimum or maximum == minimum == 0

        extreme = max([abs(maximum), abs(minimum)])
        print("Extreme:", extreme)
        sub_result[index].append([value / extreme for value in sequence])
    for index, item in enumerate(sub_result[0][0]):
        result.append([sub_result[i][0][index] for i in index_range])
    assert np.array(result).shape == data_shape
    return result

def read_CSV(
    filename,
    sequence_length = 512,
    read_from_index = 1,
    stop_before_index = 3
):
    """Iterable which yields sequence_length data points at a time from a CSV file"""
    print('read_CSV: Sequence length: ' + str(sequence_length))
    print('read_CSV: Reading from column ' + str(read_from_index) + ' to column ' + str(stop_before_index - 1))
    print('read_CSV: (Column numbers start with 0, so the first column is column 0.)')
    print('read_CSV: Opening "' + filename + '"')
    with open(filename, 'r') as f:
        c = 0
        lines = f.readlines()
        print('read_CSV: File "' + filename + '" has ' + str(len(lines)) + ' lines')
        while len(lines) > sequence_length * (c+1):
            print('read_CSV: Building sequence ' + str(c))
            print('read_CSV: File "' + filename + '" has ' + str(len(lines) - sequence_length * c) + ' lines remaining')
            sequence = []
            for i, line in enumerate(lines[(sequence_length*c):(sequence_length*(c+1))]):
                record = line.rstrip().split(',')
                sequence.append([np.float32(n) for n in record[read_from_index:stop_before_index]])
            assert np.array(sequence).shape == (sequence_length, abs(stop_before_index - read_from_index))
            print('read_CSV: Yielding sequence')
            sequence = regularize(sequence)
            assert np.array(sequence).shape == (sequence_length, abs(stop_before_index - read_from_index))
            yield sequence
            c += 1

def read_multiple_CSVs(filenames):
    """
    Iterable which, like read_CSV, yields a number of data points at a time, but across multiple CSV files, using read_CSV to read each one.
    Actually returns a numpy array which contains the sequence twice.
    """
    for filename in filenames:
        for sequence in read_CSV(filename):
            yield np.array([sequence, sequence])

def find_CSVs(directory):
    """Produces a list of files in the given directory whose names end in '.csv'"""
    CSV_list = glob(directory + '/*.csv')
    return CSV_list

train_CSV_list = find_CSVs('data/train') # Should be 100% normal data, absolutely NO EARTHQUAKES allowed

#> Define Neural Net Structure <#

autoencoder = Sequential([ # Takes a series of 512 pairs of North/South and East/West magnetometer readings
    LSTM(4, return_sequences=True, input_shape=(512, 2)),
    Conv1D(2, 4, strides=4, input_shape=(512, 4)),
    LSTM(4, return_sequences=True, input_shape=(128, 2)),
    Conv1D(2, 4, strides=4, input_shape=(128, 4)),
    LSTM(4, return_sequences=True, input_shape=(32, 2)),
    Conv1D(4, 4, strides=4, input_shape=(32, 4)),
    LSTM(4, return_sequences=True, input_shape=(8, 4)),
    Conv1D(8, 4, strides=4, input_shape=(8, 4)),
    Flatten(input_shape=(2, 8)),
    RepeatVector(32),
    LSTM(4, return_sequences=True, input_shape=(32, 16)),
    UpSampling1D(input_shape=(32, 4)),
    LSTM(2, return_sequences=True, input_shape=(64, 4)),
    UpSampling1D(input_shape=(64, 2)),
    LSTM(2, return_sequences=True, input_shape=(128, 2)),
    UpSampling1D(input_shape=(128, 2)),
    LSTM(2, return_sequences=True, input_shape=(256, 2)),
    UpSampling1D(input_shape=(256, 2)),
    LSTM(2, return_sequences=True, input_shape=(512, 2)),
]) # Outputs a reconstructed series of 512 magnetometer readings.

#> Configure the Training Hyperparameters <#

autoencoder.compile(loss='mean_squared_error', optimizer='nadam')

#> Train the Autoencoder! <#

def train_autoencoder():
    autoencoder.fit(
        read_multiple_CSVs(train_CSV_list),
        epochs=10
    )

print(autoencoder.summary())
train_autoencoder()

Answer 1

Input doesn't return a layer so below is wrong输入不返回图层，所以下面是错误的

autoencoder = Sequential([ 
    Input((512,2)), #<-- This is wrong
.....

Sequential model should be supplied with an list of layers. Sequential model 应提供层列表。

Fix:使固定：

import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Input, Conv1D, Reshape, RepeatVector, Flatten, UpSampling1D

from keras.utils import plot_model
import os

autoencoder = Sequential([
    LSTM(4, return_sequences=True),
    Conv1D(2, 4, strides=4),
    LSTM(4, return_sequences=True),
    Conv1D(2, 4, strides=4),
    LSTM(4, return_sequences=True),
    Conv1D(4, 4, strides=4),
    LSTM(4, return_sequences=True),
    Conv1D(8, 4, strides=4),
    #LSTM(1, return_sequences=True),
    Flatten(),
    RepeatVector(32),
    LSTM(2, return_sequences=True),
    UpSampling1D(),
    LSTM(2, return_sequences=True),
    UpSampling1D(),
    LSTM(2, return_sequences=True),
    UpSampling1D(),
    LSTM(2, return_sequences=True),
    UpSampling1D(),
    LSTM(2, return_sequences=True),
])

autoencoder.compile(loss='mean_squared_error', optimizer='nadam')

autoencoder.fit(np.random.randn(10,512,16), np.random.randn(10,512,2))

Output: Output：

Epoch 1/1
10/10 [==============================] - 6s 568ms/step - loss: 1.0235

Input() is used to instantiate a Keras tensor and it return tensors. Input()用于实例化 Keras 张量并返回张量。 However, a sequential model required layers.但是，顺序 model 需要层。 Docs文档

Also check the answer by kgangadhar for this question.还要检查 kgangadhar 对这个问题的回答。

Answer 2

In the read_multiple_CSVs function, I changed yield np.array([sequence, sequence]) to yield (np.array([sequence]), np.array([sequence])) .在read_multiple_CSVs function 中，我将yield np.array([sequence, sequence])更改为yield (np.array([sequence]), np.array([sequence])) 。

Now it works.现在它起作用了。

Looks like passing a numpy array doesn't work (I tried using one with the correct shape), and that the shape I needed was (2, 1, 512, 2), as a tuple of numpy arrays.看起来像传递 numpy 数组不起作用（我尝试使用具有正确形状的数组），并且我需要的形状是（2、1、512、2），作为 numpy ZA3CBC3F9D0CE2F2CD156 的元组Slightly strange, but works.有点奇怪，但有效。

训练 LSTM 自动编码器时出错：“没有为任何变量提供梯度”

问题描述

Things I've Tried:我尝试过的事情：

The Complete Texts:全文：

2 个解决方案

解决方案1
0 2020-07-02 20:51:19

解决方案2
0 2020-07-03 03:58:04

训练 LSTM 自动编码器时出错：“没有为任何变量提供梯度”

问题描述

Things I've Tried:我尝试过的事情：

The Complete Texts:全文：

2 个解决方案

解决方案1 0 2020-07-02 20:51:19

解决方案2 0 2020-07-03 03:58:04

解决方案1
0 2020-07-02 20:51:19

解决方案2
0 2020-07-03 03:58:04