How can I test my training model using a different dataset in machine learning

Question

Hello I am very new to Python and machine learning and I am running into a issue. After splitting and completing my training and testing models, now I need to test a complete different dataset.

Below is how I created my training and test:

Using NaiveBayes Classifier model nb_model = sklearn.naive_bayes.MultinomialNB() nb_model.fit(X_train_v, y_train) y_pred_class = nb_model.predict(X_test_v) y_pred_probs = nb_model.predict_proba(X_test_v)

What would I need to adjust in order to change the dataset that I am using so I can run a new dataset to the training model.

Thank you for your time and your help!

Answer 1

Specifically and functionally speaking, your new dataset should have the same number of features.

If x_train.shape gives (752, 8) , then you know it has 8 features and 752 samples.

After that your model was trained on it, you can be sure that model.n_features will give you 8 .

Your model now is able to predict outputs from data with 8 features:

import numpy as np
# 10 randomly generated samples with 8 features
new_dataset_1 = np.random.randint(0, 100, size=(10, 8))
new_pred_1 = model.predict(new_dataset_1)
# > array([47, 15,  2, 81, 99, 63, 53, 55, 24, 47])
new_pred_1.shape
# > (10, )  # One predicted class per sample

If you try to predict from data that has any other count of features, it will fail:

# 10 randomly generated samples with 9 features
new_dataset_2 = np.random.randint(0, 100, size=(10, 9))
new_pred_2 = model.predict(new_dataset_2)
# > ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0,
# with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 8 is different from 9)

In other instances, there might be ways to get the same amount of features, but it all depends on the hypothesis, on the kind of data or on the tested model.

Of course, this is just an illustration and it doesn't make any sense to predict on randomly generated data. Your new data should instead represent something that is related to the training data.

For example, you can consider that it is reasonable to try to predict the reproductive rate of fire ants from Austria with a model that you trained on the reproductive rate of fire ants from Germany.

How can I test my training model using a different dataset in machine learning

Question

1 answers

solution1
0 2021-06-08 07:14:33

How can I test my training model using a different dataset in machine learning

Question

1 answers

solution1 0 2021-06-08 07:14:33

solution1
0 2021-06-08 07:14:33