简体   繁体   English

验证数据与训练数据分开 TensorFlow 2?

[英]Validation data separate from training data TensorFlow 2?

When training a model with TensorFlow 2 as shown below, should the validation data be separated from the training data before it is passed to the model's fit method, or can it be part of the training set?如下图所示使用 TensorFlow 2 训练 model 时,验证数据是否应该在传递给模型的fit方法之前与训练数据分开,还是可以作为训练集的一部分? At the end of the code below show two options.在下面的代码末尾显示两个选项。 I would believe option 1 is the correct, but as I have seen some sources using option 2 I want to make sure I understand it correctly.我相信选项 1 是正确的,但正如我看到一些使用选项 2 的消息来源一样,我想确保我理解正确。

X_train, X_test, y_train, y_test = train_test_split(df_x, series_y)

best_weight_path = 'best_weights.hdf5'

numpy_x = df_x.to_numpy()
numpy_y = series_y.to_numpy()

numpy_x_train = X_train.to_numpy()
numpy_y_train = y_train.to_numpy()
numpy_x_test = X_test.to_numpy()
numpy_y_test = y_test.to_numpy()

model = Sequential()
model.add(Dense(32, input_dim=x.shape[1], activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto')
checkpointer = ModelCheckpoint(filepath=best_weight_path, verbose=0, save_best_only=True)

option 1选项1

model.fit(numpy_x_train, numpy_y_train, validation_data=(numpy_x_test, numpy_y_test), callbacks=[monitor, checkpointer], verbose=0, epochs=1000)

option 2选项 2

model.fit(numpy_x, numpy_y, validation_data=(numpy_x_test, numpy_y_test), callbacks=[monitor, checkpointer], verbose=0, epochs=1000)

the first option is correct... you split before the data and use your train to fit and evaluate on test/valid第一个选项是正确的......您在数据之前拆分并使用您的火车来适应和评估测试/有效

the second option no... you are putting all your data to train while are passing a subpart of them for evaluation.第二个选项不...您正在将所有数据进行训练,同时将其中的一部分传递给评估。 Keras is not so clever to understand this. Keras 没那么聪明地理解这一点。 but to achieve what u are looking for in this second option u simply need validation_split = 0.xxx但要实现你在第二个选项中寻找的东西,你只需要validation_split = 0.xxx

model.fit(numpy_x, numpy_y, validation_split=0.xxx, callbacks=[monitor, checkpointer], verbose=0, epochs=1000)

in other words, u pass ALL your data in fit and then Keras operate a random split of them using a 0.xxx % for evaluation/testing换句话说,你传递你的所有数据,然后 Keras 使用 0.xxx % 对它们进行随机拆分以进行评估/测试

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM