简体   繁体   English

是否有一种直接的方法可以在 Python 中的一组全新数据上使用经过训练的机器学习 Model

[英]Is there a straight forward approach to use a Trained Machine Learning Model on a brand new set of data in Python

I am noticing similar questions on this topic when I search the Internet;当我在互联网上搜索时,我注意到关于这个主题的类似问题; however, most of the answers points to generating random data to explain the approach to a viable solution and do not seem to explain what I am trying to understand in Python, sklearn, LogisticRegression.但是,大多数答案都指向生成随机数据以解释可行解决方案的方法,并且似乎无法解释我在 Python、sklearn、LogisticRegression 中试图理解的内容。

I am trying to learn and understand the Machine Learning Model Prediction.我正在尝试学习和理解机器学习 Model 预测。 I visited Kaggle and downloaded the Titanic data to play and build a Survive prediction model.我访问了 Kaggle 并下载了泰坦尼克号数据来玩并构建生存预测 model。 I was able to build a Logistic Regression to train my model and save it for later.我能够建立一个逻辑回归来训练我的 model 并将其保存以备后用。

from sklearn.linear_model import LogisticRegression

X_train, X_test, y_train, y_test = train_test_split(data_train[['Sex', 'Pclass', 'Age','Relatives', 'Fare']], data_train.Survived, test_size=0.33, random_state=0)
# print(X_train.shape)
clf = LogisticRegression(random_state=0).fit(X_train, y_train)

# save the model to disk with JobLib
filename = 'final_model_Joblib.sav'
joblib.dump(clf, filename)

I would like to now use this model on a brand new Tatanic data set, attempting to predict the survival, which do not exist in this new data set.我现在想在一个全新的 Tatanic 数据集上使用这个 model,试图预测这个新数据集中不存在的生存率。

How would I go about importing my trained model on this new Titanic data set to make the prediction, where X_test and y_test represent my new Titanic data without survival data?我将如何 go 关于在这个新的泰坦尼克号数据集上导入我训练有素的 model 来进行预测,其中 X_test 和 y_test 代表我没有生存数据的新泰坦尼克号数据?

# load the model from disk
loaded_model = joblib.load(filename)
result = loaded_model.score(X_test, y_test)
print(result)

Well, the whole purpose of training a model is to predict on the unseen data, given the features and class distribution of features are the same in your training data or the unseen data.好吧,训练 model 的全部目的是预测看不见的数据,给定特征和 class 的特征分布在您的训练数据或看不见的数据中是相同的。 Once you dump a model using joblib or pickle it serializes the model (convert into python byte stream object) and if you load it you will get the same object back. Once you dump a model using joblib or pickle it serializes the model (convert into python byte stream object) and if you load it you will get the same object back. You can use loaded_model.predict(x) according to sklearn docs to find the class prediction on unseen data or the score function to get the accuracy score of your model.您可以根据 sklearn 文档使用 loaded_model.predict(x) 来查找对看不见的数据的 class 预测或分数 function 以获得 Z20F35E630DAF44DBFA4C3F68F539 的准确度分数。 for more info, you can check this - https://www.geeksforgeeks.org/saving-a-machine-learning-model/ .有关更多信息,您可以查看 - https://www.geeksforgeeks.org/saving-a-machine-learning-model/ Hope this answer your question.希望这能回答你的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM