I have a data set like this
As you can see there is one categorical variable which is state
later I encode categorical variable
If I want to test my model with specific data I do something like this
print(regressor.predict([[1,0,1000,2000,3000]]))
Which works fine . But what I want to do is , while testing I directly want to input the city name , like New York
or Florida
How can I achieve this ?
A machine learning model can only work on numeric data. This is the reason why you had to encode your "states". There are few ways to achieve what you are saying: a) Use a function to return encoded value of the "state" while you can enter something like
print(regressor.predict([[1,0,1000,func("New York"),3000]]))
b) Use implicit encoding, which creates as many columns for each categorical variable implicitly.
由于ML模型仅输入数字,因此即使对测试数据集也必须进行编码,然后将其传递给模型。
You could use scikit-Learn LabelEncoder for transforming and inverse transforming the categorical value.
ie)
>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit(["New York", "Florida", "US", "Florida", "New York"])
LabelEncoder()
>>> le.transform(["New York", "Florida", "US", "Florida", "New York"])
array([0, 0, 1, 2]...)
>>> le.inverse_transform([0])
"New York"
You can call your function like below.
print(regressor.predict([[1,0,1000,le.transform(["New York"])[0],3000]]))
As others have mentioned before, any model takes only numbers as inputs. For this reason, usually we create a preprocessing function which can be applied to both the train and test sets at once.
In this case, you need to define a function which transforms the input vector into a numerical vector which can be further fed to your machine learning model:
Inputs -> Preprocessing -> Model
This preprocessing needs to be just like what you used for training so that you achieve the results you want to.
So typically when you create a model, your complete 'Model' can actually be a wrapper around the actual model that you use. For instance:
class MyModel():
def __init__(self,):
# Inputs and other variables like hyperparameters
self.model = Model() # Initialise a model of your choice
def preprocess(self, list_to_preprocess):
# Preprocess this list
def train(self, train_set):
X_train, y_train = preprocess(X_train)
self.model.fit(X_train, y_train)
def predict(self, test_set):
# If X_test is a vector, reshape and then preprocess
X_test, y_test = preprocess(test_set)
pred = self.model.predict(X_test)
# Evaluate using pred and y_test
So finally to predict you use the function MyModel.predict()
and not Model.predict()
to achieve what you want to.
This is not elegant at all, but you can just write if... elif
statement depending on the input, like:
a = input("Please enter the state: ")
if a = "New York":
print(regressor.predict([[1,0,1000,2000,3000]]))
elif a = "Florida":
print(regressor.predict([[0,1,1000,2000,3000]]))
else:
print("Invalid state selected")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.