简体   繁体   English

sklearn logisitc回归中的特征选择

[英]Feature selection from sklearn logisitc regression

I have created a binary classification model for a text using sklearn logistic regression model. 我已经使用sklearn logistic回归模型为文本创建了一个二进制分类模型。 Now I want to select the features used for model. 现在,我要选择用于模型的特征。 My code looks like this- 我的代码看起来像这样-

train, val, y_train, y_test = train_test_split(np.arange(data.shape[0]), lab, test_size=0.2, random_state=0)
X_train = data[train]
X_test = data[val]

#X_train, X_test, y_train, y_test = train_test_split(data, lab, test_size=0.2)
tfidf_vect = TfidfVectorizer(analyzer='word', ngram_range=(1,3), min_df = 0, stop_words = 'english')
X_tfidf_train = tfidf_vect.fit_transform(X_train)
X_tfidf_test = tfidf_vect.transform(X_test)
clf_lr = LogisticRegression(penalty='l1')
clf_lr.fit(X_tfidf_train, y_train)
feature_names = tfidf_vect.get_feature_names()
print len(feature_names)
y_pred_lr = clf_lr.predict_proba(X_tfidf_test)[:, 1]

What will be the best approach to do this. 什么是做到这一点的最佳方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM