简体   繁体   中英

ValueError: X has 13 features, but RandomForestClassifier is expecting 30 features as input

I am getting this error when I am trying to make a prediction

input_data=[[58,    0,  0,  100,    248,    0,  0,  122,    0,  1,  1,  0,  2]]
prediction = random_forest.predict(input_data)
print(prediction)

I used get_dummies method for categorical data hence the number of features has increased to 30

categorical_val.remove('target')
dataset = pd.get_dummies(df, columns = categorical_val)
# dataset=df
from sklearn.preprocessing import StandardScaler

s_sc = StandardScaler()
col_to_scale = ['age', 'trestbps', 'chol', 'thalach', 'oldpeak']
dataset[col_to_scale] = s_sc.fit_transform(dataset[col_to_scale])

I have used different classification models one of which is RandomForest

from sklearn.ensemble import RandomForestClassifier
# create regressor object
random_forest = RandomForestClassifier(n_estimators = 100, random_state = 0)
random_forest.fit(X_train, y_train) 
pred=random_forest.predict(X_test)

Error:

/usr/local/lib/python3.7/dist-packages/sklearn/base.py:451: UserWarning: X does not have valid feature names, but RandomForestClassifier was fitted with feature names
  "X does not have valid feature names, but"
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-45-be47ec6e672c> in <module>()
      8 
      9 input_data=[[58,        0,      0,      100,    248,    0,      0,      122,    0,      1,      1,      0,      2]]
---> 10 prediction = random_forest.predict(input_data)
     11 print(prediction)
     12 

4 frames
/usr/local/lib/python3.7/dist-packages/sklearn/base.py in _check_n_features(self, X, reset)
    399         if n_features != self.n_features_in_:
    400             raise ValueError(
--> 401                 f"X has {n_features} features, but {self.__class__.__name__} "
    402                 f"is expecting {self.n_features_in_} features as input."
    403             )

ValueError: X has 13 features, but RandomForestClassifier is expecting 30 features as input.

I know I am getting this error because of get_dummies() method, but if I don't use it the accuracy of the models change.

The problem is you're training a model on different size of feature set but the input_data=[[58, 0, 0, 100, 248, 0, 0, 122, 0, 1, 1, 0, 2]] doesn't fit in with training size. What get_dummies() does is it converts each value in categorical column as seperate column if you have values 1,2,3 or a,b,c in a column get_dumies would create three columns out of this. So when you're giving input to perform prediction convert them those expand number of columns for categorical data into number of categories in that column the values in that column would be 0 and 1. O represents absence of that category and 1 shows presence of that category. For example I have data with 3 columns [[2,2,3]] first two columns have 2 categories and 3rd column has 3 categories so the new dataset would be for column [1,2,1,2,1,2,3] and value of [[2,2,3]] in expanded form would be [0,1,0,1,0,0,1]. I hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM