使用 SelectKBest 在 Python 中的特征重要性

Question

I am trying to get the top 5 features for my dataframe df with X_train and y_train.我正在尝试使用 X_train 和 y_train 为我的数据框 df 获取前 5 个功能。

bestfeatures = SelectKBest(score_func=chi2, k=5) #k=5 means select top 5 features
fit = bestfeatures.fit(X_train,y_train)
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X_train.columns)
#concat two dataframes for better visualization 
featureScores = pd.concat([dfcolumns,dfscores],axis=1)
featureScores.columns = ['Features','Score']  #naming the dataframe columns
print(featureScores.nlargest(5,'Score'))  #print 5best features

Error错误

ValueError                                Traceback (most recent call last)
<ipython-input-54-47286ab0e6e9> in <module>
      6 
      7 bestfeatures = SelectKBest(score_func=chi2, k=5)
----> 8 fit = bestfeatures.fit(X_train,y_train)
   
    ValueError: Unknown label type: (array([23.5, 35, 38.......
   .......]),)

PS My Y_train is 23.5 , 35, 38 and so on... as in valueerror PS 我的 Y_train 是 23.5 、 35 、 38 等等......就像在 valueerror

How to solve this?如何解决这个问题？

Answer 1

Your score function is chi2 so you are doing classification, not regression.你的分数函数是chi2所以你在做分类，而不是回归。 You must therefore pass values in a finite space (such as: string, integer, etc.);因此，您必须在有限空间中传递值（例如：字符串、整数等）； floats can only be used for regression.浮点数只能用于回归。

使用 SelectKBest 在 Python 中的特征重要性

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-01-17 21:09:14

使用 SelectKBest 在 Python 中的特征重要性

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-01-17 21:09:14

解决方案1
1 已采纳 2021-01-17 21:09:14