Tensorflow/Keras：训练数据和预测数据中的 Output 形状不匹配

Question

I have an LSTM model defined in tensorflow/keras as below.我在 tensorflow/keras 中定义了一个 LSTM model，如下所示。 I am including only the relevant details pertaining to the question.我只包括与问题有关的相关细节。

t_steps = 60
n_features = 3

def LSTMModel():
  input = Input(shape=(t_steps, n_features))
  l1 = BatchNormalization()(input)
  l2 = LSTM(160,return_sequences=True)(l1)
  l3 = LSTM(80,return_sequences=True)(l2)
  l4 = LSTM(10,return_sequences=True)(l3)
  l5 = Dense(1, activation = 'relu')(l4)
  model = Model(inputs = input, outputs = l5)
  model.compile(loss="mean_squared_error", optimizer='adam')
  return model

# shape of X_train_seq is (275268, 60, 3)
# shape of Y_train_seq is (275268,)

model = LSTMModel()
model.fit(X_train_seq, Y_train_seq, epochs=n_epochs,
  batch_size=batch_size,verbose=1,initial_epoch=init_epoch)

Then when I predict using this model on a X_test_seq of shape (30355, 60, 3), I get Y_test_seq_pred of shape (30355, 60, 1) while I expect the prediction of shape (30355,).然后，当我预测在形状为 (30355, 60, 3) 的 X_test_seq 上使用此 model 时，我得到形状为 (30355, 60, 1) 的 Y_test_seq_pred，而我期望预测形状 (30355,)。 This happened because my l4 line in the above code should have been发生这种情况是因为我上面代码中的 l4 行应该是

l4 = LSTM(10,return_sequences=False)(l3)

My question is, with the original code, why didnt Tensorflow/Keras give an error or produce any sort of warning during training as the shape of Y_train_seq that is passed to the fit() method is (275268,) and during training, it must have been internally predicting a Y of shape (batch_size, 60, 1) for every batch during optimization and comparing it with a slice of shape (batch_size,) from Y_train_seq.我的问题是，使用原始代码，为什么 Tensorflow/Keras 在训练期间没有给出错误或产生任何类型的警告，因为传递给 fit() 方法的 Y_train_seq 的形状是 (275268) 并且在训练期间，它必须在优化期间一直在内部预测每个批次的形状 Y (batch_size, 60, 1)，并将其与来自 Y_train_seq 的形状切片 (batch_size,) 进行比较。 How come this dimension mismatch in Y still let the code continue only for me to find out about it at the end of the training?为什么 Y 中的这种尺寸不匹配仍然让代码继续运行，只是让我在训练结束时发现它？ I am sure there must be some reason behind it and I want to know what is going on.我确信这背后一定有一些原因，我想知道发生了什么。 Thank you!!谢谢！！

Answer 1

That's occurs due to what's called on Numpy broadcasting, which describes how Numpy treats arrays with different shapes.这是由于Numpy广播中所谓的内容而发生的，该广播描述了 Numpy 如何处理具有不同形状的 arrays。

Let first define two tensors A and B tensors with the following shapes,让我们首先定义两个具有以下形状的张量A和B张量，

dim(A) = (32, 60, 1)
dim(B) = (32,)

For example, if you try to do some operations such subtraction , addition or any element-wise operation between A and B , in such case the way how Tensorflow represent B dimensions, something like replacing dimensions with don't care.例如，如果您尝试在A和B之间执行一些操作，例如减法、加法或任何元素操作，在这种情况下， Tensorflow如何表示B维度，例如用无关替换维度。

dim(B) = (do_not_care, do_not_care, 32)

So at iteration, scalar value x subtracted from an array of 32 elements (values), Therefor the output is a column matrix contains 32 different values.所以在迭代时，从 32 个元素（值）的数组中减去标量值 x，因此 output 是一个包含 32 个不同值的列矩阵。

dim(output) = (32, 60, 32) , which is valid, but unexpected behavior. dim(output) = (32, 60, 32) ，这是有效的，但意外的行为。

Such operation could be implemented recursively , we are iterating through the tensor with a higher dimension, starting from the 1st dimension ( axis ) to the known dimension of the second tensor, then we can apply the required operation to the resultant chunked tensor, if their shapes match, in other words the broadcasting succeeds .这样的操作可以递归实现，我们通过更高维度的张量迭代，从第一个维度（轴）开始到第二个张量的已知维度，然后我们可以将所需的操作应用于生成的分块张量，如果他们形状匹配，即广播成功。

The the implementation of a broadcasting for subtraction could be done as follows.减法广播的实现可以如下进行。

def supstract(a, b): # a and b are numpy.array()

  # check if shape valid
  if a.squeeze().ndim == b.squeeze().ndim:
    if b.squeeze().shape[-1] == b.squeeze().shape[-1]:
      print(a - b, '\n')
      return
    else:
      raise ValueError(f'could not Broadcasting, {a.shape} != {b.shape}')

  elif a.ndim > b.ndim:
    for sub_a in a:
      supstract(sub_a, b) # recursive call
  else:
    for sub_b in b:
      supstract(a, sub_b) # recursive call

I have used the following example to check the implementation, so I may have missed some conditions (different cases handling).我已经使用下面的例子来检查实现，所以我可能错过了一些条件（不同的情况处理）。

a = np.random.randint(1, 4, (2, 2, 6, 1))
b = np.random.randint(1, 4, (4, ))

In your case Tensorflow need Y to compute the loss function, you are using MSE which just a scalar value, computed using Keras backend as follows.在您的情况下， Tensorflow需要 Y 来计算损失 function，您使用的MSE只是一个标量值，使用Keras后端计算，如下所示。

from tensorflow.keras import backend as K

def mean_squared_error(y_true, y_pred):
  return K.mean((y_pred - y_true) ** 2) # broadcasting (y_pred - y_true)

Update更新

|1 2 3| + |7 8 9|
|4 5 6|

|1 2 3| + 7 = |8  9  10|
|4 5 6|       |11 12 13|

Based on the Tensorflow documentation for describing the broadcasting .基于用于描述广播的Tensorflow文档。 Numpy (broadcasting) handle any of the above operations, but mathematically you can't apply any of them. Numpy （广播）处理上述任何操作，但在数学上你不能应用它们中的任何一个。 If you want to have warning for any broadcasting operation, you might expect to get tons of warnings.如果您想对任何广播操作发出警告，您可能会收到大量警告。

Tensorflow/Keras：训练数据和预测数据中的 Output 形状不匹配

问题描述

1 个解决方案

解决方案1
0 2020-05-26 22:16:05

Update更新

Tensorflow/Keras：训练数据和预测数据中的 Output 形状不匹配

问题描述

1 个解决方案

解决方案1 0 2020-05-26 22:16:05

Update更新

解决方案1
0 2020-05-26 22:16:05