简体   繁体   English

我如何在机器学习中使用不同的数据集测试我的 model

[英]how can i test my model using different dataset in machine learning

im new in machine learning and i am create a one small project using CountVectorizer model.我是机器学习的新手,我正在使用 CountVectorizer model 创建一个小项目。 i am split my data to 80% -20%.我将我的数据分成 80% -20%。 80% for training model and 20% for testing model. 80% 用于训练 model,20% 用于测试 model。 my model work properly run on 20% test data but can i used to test my model on different data set that is similar to training data set?我的 model 可以在 20% 的测试数据上正常运行,但是我可以用来在类似于训练数据集的不同数据集上测试我的 model 吗?

i am using joblib for dump and load my model.我正在使用 joblib 进行转储并加载我的 model。

from joblib import dump, load
dump(pipe, filename)

loaded_model = load('filename')

my question is how i directly test my model using different dataset?我的问题是我如何使用不同的数据集直接测试我的 model?

Yes, you can use the model to test similar datasets.是的,您可以使用 model 来测试类似的数据集。

However, you must keep in mind the preprocessing step according to the model.但是,您必须牢记根据 model 的预处理步骤。

When you trained your model, it was trained on a particular dimension and the size of input would have been AxB matric.当您训练 model 时,它在特定维度上进行了训练,输入的大小将是 AxB 矩阵。 When you have a new test sentence or new dataset, you must first do the same preprocessing, otherwise, it will throw dimension mismatch errors.当你有一个新的测试句子或新的数据集时,你必须先做同样的预处理,否则会抛出维度不匹配错误。

Example:例子:

suppose you have the following count vectorizer object假设您有以下计数向量器 object

cv = CountVectorizer()

then you must first fit it on your training dataset, for say那么你必须先把它放在你的训练数据集上,比如说

X = dataframe['text_column_name']
X = cv.fit_transform(X) # Fit the Data

Once this is done, whenever you have a new sentence, say完成此操作后,每当您有新句子时,请说

test_sentence = "this is a test sentence"

then you must use the cv object in the following manner那么您必须按以下方式使用 cv object

model_input = cv.transform([test_sentence]).toarray()

and then you can make predictions:然后你可以做出预测:

model.predict(model_input)

This method must be followed even if you wish to test a new dataset which is in a data frame or some other file format.即使您希望测试数据框或其他文件格式中的新数据集,也必须遵循此方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在机器学习中使用不同的数据集测试我的训练 model - How can I test my training model using a different dataset in machine learning 当我使用不同的数据集在机器学习中测试 model 时,为什么结果不准确? - Why results are inaccurate when I am using different dataset for testing a model in Machine Learning? 我可以删除测试数据集中的列吗? 机器学习 - can I delete columns in test dataset? machine learning 在不同的数据集上运行经过培训的机器学习模型 - Run trained Machine Learning model on a different dataset 如何稳定机器学习 model? - How can I stabilize a machine learning model? 如何通过Python机器学习模型运行测试数据? - How do I run test data through my Python Machine Learning Model? 如何在机器学习 model 中使用 test_proportion 数据? - How can I use the test_proportion data in a machine learning model? 如何在新数据集中测试深度学习模型 - how to test a deep learning model in a new dataset 如何提高线性回归模型的准确性?(使用python进行机器学习) - How can I increase the accuracy of my Linear Regression model?(machine learning with python) 如何在使用 Keras 的机器学习中使用单个 class 教授 model? - How can I teach a model using single class in machine learning using Keras?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM