简体   繁体   中英

Building a predictive model with text data and other predictors

I'm trying to build a predictive model (random forest, sgd, etc.) using scikit-learn and it seems like every model only allows you to fit text data such as

classifier.fit(X,Y)

...where Y is the target and X is a text feature vector (count_vec -> tf_idf). Is there any way to have a model which in addition to the text feature matrix also contains several categorical variables? Can I simply append them as new columns on the right side of X ?

You will need to convert categorical data first - simple appending of string categories to the number values from a feature extractor like TfIdfCountVectorizer will not work. Here's a SO question and answer on converting categories into numerical feature data that you can append to the right.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM