model.evaluate（）中batch_size的含义

Question

I am building a plain vanilla FNN and want to evaluate my model after training. 我正在建造一个普通的香草FNN，并希望在训练后评估我的模型。 I was wondering what impact the batch_size has when evaluating the model on a test set. 我想知道在测试集上评估模型时batch_size有什么影响。 Of course it is relevant for training as it determines the number of samples to be fed to the network before computing the next gradient. 当然，它与训练相关，因为它在计算下一个梯度之前确定要馈送到网络的样本数。 It is also clear that it can be needed when predicting values for a (statefull) RNN. 同样清楚的是，在预测（有状态的）RNN的值时可能需要它。 But it is not clear to me why it is needed when evaluating the model especially a FNN. 但是我不清楚为什么在评估模型尤其是FNN时需要它。 Furthermore, I get slightly different values when I evaluate the model on the same test set but with different batch sizes. 此外，当我在同一测试集上评估模型但具有不同的批量大小时，我会得到略微不同的值。 Consider the following toy example: 考虑以下玩具示例：

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.optimizers import SGD

# function to be learned
def f(x):
    return x[0] + x[1] + x[2]

# sample training and test points on a rectangular grid
x_train = np.random.uniform(low = -10, high = 10, size = (50,3))
y_train = np.apply_along_axis(f, 1, x_train).reshape(-1,1)

x_test = np.random.uniform(low = -10, high = 10, size = (50,3))
y_test = np.apply_along_axis(f, 1, x_test).reshape(-1,1)

model = Sequential()
model.add(Dense(20, input_dim = 3, activation = 'tanh'))
model.add(Dense(1))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mse',
      optimizer=sgd)
model.fit(x_train, y_train, batch_size = 10, epochs = 30, verbose = 0)

model.evaluate(x_test, y_test, batch_size = 10)
model.evaluate(x_test, y_test, batch_size = 20)
model.evaluate(x_test, y_test, batch_size = 30)
model.evaluate(x_test, y_test, batch_size = 40)
model.evaluate(x_test, y_test, batch_size = 50)

The values are very similar but nevertheless different. 值非常相似，但仍然不同。 Where does this come from? 这是从哪里来的？ Shouldn't the following be always true? 以下不应该总是如此吗？

from sklear.metrics import mean_squared_error as mse
0 == model.evaluate(x_test, y_test) - mse(model.predict(x_test), y_test)

Answer 1

No, they don't have to be the same. 不，他们不一定是一样的。 If you combine floating point math with parallelism, you don't get reproducible results as then (a + b) + c is not the same as a + (b + c). 如果将浮点数学与并行性结合起来，则不会得到可重现的结果，因为（a + b）+ c与a +（b + c）不同。

The evaluate function of Model has a batch size just in order to speed-up evaluation, as the network can process multiple samples at a time, and with a GPU this makes evaluation much faster. 由于网络可以一次处理多个样本，因此模型的评估函数具有批量大小以便加速评估，并且使用GPU可以使评估更快。 I think the only way to reduce the effect of this would be to set batch_size to one. 我认为减少这种影响的唯一方法是将batch_size设置为1。

Answer 2

The evaluation values differ simply because float values lack of precision. 评估值的不同之处仅在于浮点值缺乏精度。

The reason for using batch size in evaluate is the same as using it in training mode. 在评估中使用批量大小的原因与在训练模式中使用批量大小相同。 And the reason is not as you said: 原因不像你说的那样：

it is relevant for training as it determines the number of samples to be fed to the network before computing the next gradient 它与训练相关，因为它在计算下一个梯度之前确定要馈送到网络的样本数

Just think about it, why can't you feed all the dataset without batches? 想一想，为什么不批量生产所有数据集？ Because you have not enough memory in your RAM to store all of it. 因为RAM中没有足够的内存来存储所有内存。 And this is also the reason when evaluating. 这也是评估时的原因。

model.evaluate（）中batch_size的含义

问题描述

2 个解决方案

解决方案1
3 已采纳 2018-06-06 16:58:56

解决方案2
0 2018-06-06 17:19:09

model.evaluate（）中batch_size的含义

问题描述

2 个解决方案

解决方案1 3 已采纳 2018-06-06 16:58:56

解决方案2 0 2018-06-06 17:19:09

解决方案1
3 已采纳 2018-06-06 16:58:56

解决方案2
0 2018-06-06 17:19:09