test machine learning model with categorical variable in python

Question

I have a data set like this

As you can see there is one categorical variable which is state

later I encode categorical variable

If I want to test my model with specific data I do something like this

print(regressor.predict([[1,0,1000,2000,3000]]))

Which works fine . But what I want to do is , while testing I directly want to input the city name , like New York or Florida

How can I achieve this ?

Answer 1

A machine learning model can only work on numeric data. This is the reason why you had to encode your "states". There are few ways to achieve what you are saying: a) Use a function to return encoded value of the "state" while you can enter something like

print(regressor.predict([[1,0,1000,func("New York"),3000]]))

b) Use implicit encoding, which creates as many columns for each categorical variable implicitly.

Answer 2

由于ML模型仅输入数字，因此即使对测试数据集也必须进行编码，然后将其传递给模型。

Answer 3

You could use scikit-Learn LabelEncoder for transforming and inverse transforming the categorical value.

ie)

>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit(["New York", "Florida", "US", "Florida", "New York"])
LabelEncoder()
>>> le.transform(["New York", "Florida", "US", "Florida", "New York"]) 
array([0, 0, 1, 2]...)
>>> le.inverse_transform([0])
"New York"

You can call your function like below.

print(regressor.predict([[1,0,1000,le.transform(["New York"])[0],3000]]))

Answer 4

As others have mentioned before, any model takes only numbers as inputs. For this reason, usually we create a preprocessing function which can be applied to both the train and test sets at once.

In this case, you need to define a function which transforms the input vector into a numerical vector which can be further fed to your machine learning model:

Inputs -> Preprocessing -> Model

This preprocessing needs to be just like what you used for training so that you achieve the results you want to.

So typically when you create a model, your complete 'Model' can actually be a wrapper around the actual model that you use. For instance:

class MyModel():

    def __init__(self,):
        # Inputs and other variables like hyperparameters
        self.model = Model() # Initialise a model of your choice

    def preprocess(self, list_to_preprocess):
        # Preprocess this list

    def train(self, train_set):
        X_train, y_train = preprocess(X_train)
        self.model.fit(X_train, y_train)

    def predict(self, test_set):
        # If X_test is a vector, reshape and then preprocess

        X_test, y_test = preprocess(test_set)
        pred = self.model.predict(X_test)

        # Evaluate using pred and y_test

So finally to predict you use the function MyModel.predict() and not Model.predict() to achieve what you want to.

Answer 5

This is not elegant at all, but you can just write if... elif statement depending on the input, like:

a = input("Please enter the state: ") 
if a = "New York":
    print(regressor.predict([[1,0,1000,2000,3000]]))
elif a = "Florida":
    print(regressor.predict([[0,1,1000,2000,3000]]))
else:
    print("Invalid state selected")

test machine learning model with categorical variable in python

Question

5 answers

solution1
3 2018-09-27 10:18:05

solution2
3 2018-09-27 10:19:46

solution3
3 2018-09-27 10:25:17

solution4
3 2018-09-27 10:39:05

solution5
1 2018-09-27 10:21:38

test machine learning model with categorical variable in python

Question

5 answers

solution1 3 2018-09-27 10:18:05

solution2 3 2018-09-27 10:19:46

solution3 3 2018-09-27 10:25:17

solution4 3 2018-09-27 10:39:05

solution5 1 2018-09-27 10:21:38

solution1
3 2018-09-27 10:18:05

solution2
3 2018-09-27 10:19:46

solution3
3 2018-09-27 10:25:17

solution4
3 2018-09-27 10:39:05

solution5
1 2018-09-27 10:21:38