简体   繁体   English

与validation_data相比,使用validation_split进行Keras fit可获得更高的结果

[英]Keras fit using validation_split gets higher results than using validation_data

I am using the following fit function: 我正在使用以下拟合函数:

history = model.fit(x=[X1_train, X2_train, X3_train],
                y=y_train,
                batch_size=50,
                epochs=20,
                verbose=2,
                validation_split=0.3,
                #validation_data=([X1_test, X2_test, X3_test], y_test),
                class_weight={0:1, 1:10})

and getting average val_acc of 0.7. 并获得平均val_acc为0.7。 But when running again, this time with the validation_data option (using data from the same dataset that I kept aside, of size around 30% of train data) I am getting an average val_acc of 0.35. 但是,当再次运行时,这次使用validation_data选项(使用我保留的同一数据集中的数据,大小约为火车数据的30%),我得到的平均val_acc为0.35。 Any reasons for getting such differences? 有什么理由得到这种差异?

As requested by the OP, I am posting my comment as an answer and try to elaborate more: 根据OP的要求,我将发表我的评论作为答案,并尝试详细说明:

When you set the validation_split argument, the validations samples are selected from the last samples in the training data and labels (ie X_train and y_train ). 设置validation_split参数时,将从训练数据和标签(即X_trainy_train )中的最后一个样本中选择验证样本。 Now, in this specific case, if the proportion of class labels in these selected samples is not the same as the proportion of the class labels in the data you provide using validation_data argument, then you should not necessarily expect the validation loss to be the same in these two cases. 现在,在这种特定情况下,如果这些选定样本中类别标签的比例与您使用validation_data参数提供的数据中类别标签的比例不同,那么您不必一定希望验证损失相同在这两种情况下。 And that's simply because your model may have different accuracy on each of the classes. 这仅仅是因为您的模型在每个类上的准确性可能不同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM