简体   繁体   English

为什么在相同数据上训练准确率为 99%,但预测准确率为 81%?

[英]Why is training accuracy at 99% but the prediction accuracy at 81% on the same data?

I looked up similar questions to the problem but I dont understand still why it gives a such a result.我查找了与该问题类似的问题,但我仍然不明白为什么它会给出这样的结果。 Is it normal for a model to train up to 99% accuracy, but when used to predict on the same exact data, it gives a lower accuracy, in this case, 81%? model 训练高达 99% 的准确度是否正常,但是当用于预测相同的精确数据时,它给出的准确度较低,在这种情况下为 81%? Shouldn't it return back 99% accuracy?它不应该返回 99% 的准确率吗?

Furthermore, when I present new unseen data, the prediction accuracy is an abysmal 17%.此外,当我展示新的看不见的数据时,预测准确度是糟糕的 17%。 Surely this cannot be right.这肯定是不对的。 I understand that the model when presented new data should be less than the model's accuracy, but no way as bad as 17%.据我了解,model 在呈现新数据时应该低于模型的准确度,但不会差到 17%。

Here is the code for context.这是上下文的代码。 I put comments for easier reading:为了便于阅读,我放了评论:

# Step 1) Split Data into Training and Prediction Sets
num_split_df_at = int(0.75*len(df))
np_train_data = df.iloc[0:num_split_df_at, columns_index_list].to_numpy()
np_train_target = list(df.iloc[0:num_split_df_at, 4])
np_predict_data = df.iloc[num_split_df_at:len(df), columns_index_list].to_numpy()
np_predict_target = list(df.iloc[num_split_df_at:len(df), 4])

# Step 2) Split Training Data into Training and Validation Sets
x_train, x_test, y_train, y_test = train_test_split(np_train_data, np_train_target, random_state=0)


# Step 3) Reshape Training and Validation Sets to (49, 5)
# prints: "(3809, 245)"
print(x_train.shape)
# prints: "(1270, 245)"
print(x_test.shape)
x_train = x_train.reshape(x_train.shape[0], round(x_train.shape[1]/5), 5)
x_test = x_test.reshape(x_test.shape[0], round(x_test.shape[1]/5), 5)
y_train = np.array(y_train)- 1
y_test = np.array(y_test)- 1
# prints: "(3809, 49, 5)"
print(x_train.shape)
# prints: "[0 1 2 3 4 5 6 7 8 9]"
print(np.unique(y_train))
# prints: "10"
print(len(np.unique(y_train)))

input_shape = (x_train.shape[1], 5)

# Step 4) Run Model
adam = keras.optimizers.Adam(learning_rate=0.0001)
model = Sequential()
model.add(Conv1D(512, 5, activation='relu', input_shape=input_shape))
model.add(Conv1D(512, 5, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(512, 5, activation='relu'))
model.add(Conv1D(512, 5, activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
    
model.fit(x_train, y_train, batch_size=128, epochs=150, validation_data=(x_test, y_test))
print(model.summary())
model.save('model_1')

# Step 5) Predict on Exact Same Trained Data - Should Return High Accuracy
np_train_data = np_train_data.reshape(np_train_data.shape[0], round(np_train_data.shape[1]/5), 5)
np_train_target = np.array(np_train_target)- 1
predict_results = model.predict_classes(np_train_data)
print(accuracy_score(predict_results, np_train_target))

# Step 6) Predict on Validation Set
np_predict_data = np_predict_data.reshape(np_predict_data.shape[0], round(np_predict_data.shape[1]/5), 5)
np_predict_target = np.array(np_predict_target)- 1
predict_results = model.predict_classes(np_predict_data)
print(accuracy_score(predict_results, np_predict_target))

Here are the prediction results:以下是预测结果:

在此处输入图像描述

My input data looks similar to this - 49 Days, 5 data points per each day:我的输入数据与此类似 - 49 天,每天 5 个数据点: 在此处输入图像描述

My output possible classification results are:我的 output 可能的分类结果是:

[1 2 3 4 5 6 7 8 9 10] converted to [0 1 2 3 4 5 6 7 8 9] for "sparse_categorical_crossentropy" [1 2 3 4 5 6 7 8 9 10]转换为“sparse_categorical_crossentropy”的[0 1 2 3 4 5 6 7 8 9]

this is because the training accuracy/loss of Keras models are calculated batch wise and then averaged ( see here ).这是因为 Keras 模型的训练精度/损失是按批次计算的,然后取平均值( 请参见此处)。 instead the validation metrics/performance are computed simultaneously on all the data passed.相反,验证指标/性能是在所有传递的数据上同时计算的。

this is simply to verify in this dummy example.这只是为了在这个虚拟示例中进行验证。 we train a NN and pass as valid data the same train data.我们训练一个神经网络并将相同的训练数据作为有效数据传递。 in this way, we can compare (a) training acc, (b) validation acc, and (c) accuracy_score at the end of the train.通过这种方式,我们可以在训练结束时比较 (a) 训练 acc、(b) 验证 acc 和 (c) accuracy_score。 as we can see (b) = (c) but (a) is different from (c) and (b) for the reason expressed above正如我们所见,(b) = (c) 但由于上述原因,(a) 与 (c) 和 (b) 不同

timestamp, features, n_sample = 45, 2, 1000
n_class = 10
X = np.random.uniform(0,1, (n_sample, timestamp, features))
y = np.random.randint(0,n_class, n_sample)

model = Sequential()
model.add(Conv1D(8, 3, activation='relu', input_shape=(timestamp, features)))
model.add(MaxPooling1D(3))
model.add(Conv1D(8, 3, activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(0.5))
model.add(Dense(n_class, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    
history = model.fit(X, y, batch_size=128, epochs=5, validation_data=(X, y))

history.history['accuracy'][-1] # (a)
history.history['val_accuracy'][-1] # (b)
accuracy_score(y, np.argmax(model.predict(X), axis=1)) # (c)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么准确率低于 0.01,但预测非常好(99.99%) - Why is accuracy lower 0.01, but prediction very good (99,99%) Keras - 训练数据的预测准确性更差? - Keras - Prediction accuracy of training data is worse? Keras 预测精度与训练精度不匹配 - Keras prediction accuracy does not match training accuracy 高精度训练但低精度测试/预测 - High accuracy training but low accuracy test/prediction 为什么 fit() 期间的训练集准确度与在相同数据上使用预测后立即计算的准确度不同? - Why is training-set accuracy during fit() different to accuracy calculated right after using predict on same data? 良好的训练准确度和验证准确度,但预测准确度较差 - Good training accuracy and validaiton accuracy but poor prediction accuracy Caffe在Python中总是给出相同的预测,但是训练的准确性很好 - Caffe always gives the same prediction in Python, but training accuracy is good LSTM CNN训练和测试准确性相同,但预测概率较低 - LSTM CNN training and test accuracy are same with low prediction probability 训练和验证期间的准确性高,使用相同数据集进行预测期间的准确性低 - High accuracy during training and validation, low accuracy during prediction with the same dataset 为什么训练准确率会波动? - Why is the training accuracy fluctuating?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM