如何在機器學習中使用不同的數據集測試我的訓練 model

Question

您好，我對 Python 和機器學習非常陌生，我遇到了一個問題。 在拆分並完成我的訓練和測試模型之后，現在我需要測試一個完全不同的數據集。

以下是我創建培訓和測試的方式：

使用 NaiveBayes 分類器 model nb_model = sklearn.naive_bayes.MultinomialNB() nb_model.fit(X_train_v, y_train) y_pred_class = nb_model.predict(X_test_v) y_pred_probs = nb_model.predict_proba(X_test_v)

我需要調整什么才能更改我正在使用的數據集，以便我可以將新數據集運行到訓練 model。

感謝您的時間和幫助！

Answer 1

具體而言，從功能上講，您的新數據集應該具有相同數量的特征。

如果x_train.shape給出(752, 8) ，那么你知道它有 8 個特征和 752 個樣本。

之后，您的 model 接受了培訓，您可以確定model.n_features會給您8 。

您的 model 現在能夠從具有 8 個特征的數據中預測輸出：

import numpy as np
# 10 randomly generated samples with 8 features
new_dataset_1 = np.random.randint(0, 100, size=(10, 8))
new_pred_1 = model.predict(new_dataset_1)
# > array([47, 15,  2, 81, 99, 63, 53, 55, 24, 47])
new_pred_1.shape
# > (10, )  # One predicted class per sample

如果您嘗試從具有任何其他特征計數的數據中進行預測，它將失敗：

# 10 randomly generated samples with 9 features
new_dataset_2 = np.random.randint(0, 100, size=(10, 9))
new_pred_2 = model.predict(new_dataset_2)
# > ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0,
# with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 8 is different from 9)

在其他情況下，可能有辦法獲得相同數量的特征，但這完全取決於假設、數據類型或測試的 model。

當然，這只是一個說明，對隨機生成的數據進行預測沒有任何意義。 相反，您的新數據應該代表與訓練數據相關的內容。

例如，您可以考慮使用您訓練的德國火蟻繁殖率的 model 來預測奧地利火蟻的繁殖率是合理的。

如何在機器學習中使用不同的數據集測試我的訓練 model

問題描述

1 個解決方案

解決方案1
0 2021-06-08 07:14:33

如何在機器學習中使用不同的數據集測試我的訓練 model

問題描述

1 個解決方案

解決方案1 0 2021-06-08 07:14:33

解決方案1
0 2021-06-08 07:14:33