I am trying to get the top 5 features for my dataframe df with X_train and y_train.
bestfeatures = SelectKBest(score_func=chi2, k=5) #k=5 means select top 5 features
fit = bestfeatures.fit(X_train,y_train)
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X_train.columns)
#concat two dataframes for better visualization
featureScores = pd.concat([dfcolumns,dfscores],axis=1)
featureScores.columns = ['Features','Score'] #naming the dataframe columns
print(featureScores.nlargest(5,'Score')) #print 5best features
Error
ValueError Traceback (most recent call last)
<ipython-input-54-47286ab0e6e9> in <module>
6
7 bestfeatures = SelectKBest(score_func=chi2, k=5)
----> 8 fit = bestfeatures.fit(X_train,y_train)
ValueError: Unknown label type: (array([23.5, 35, 38.......
.......]),)
PS My Y_train is 23.5 , 35, 38 and so on... as in valueerror
How to solve this?
Your score function is chi2
so you are doing classification, not regression. You must therefore pass values in a finite space (such as: string, integer, etc.); floats can only be used for regression.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.