I'm trying to build a predictive model (random forest, sgd, etc.) using scikit-learn and it seems like every model only allows you to fit text data such as
classifier.fit(X,Y)
...where Y
is the target and X
is a text feature vector (count_vec -> tf_idf). Is there any way to have a model which in addition to the text feature matrix also contains several categorical variables? Can I simply append them as new columns on the right side of X
?
You will need to convert categorical data first - simple appending of string categories to the number values from a feature extractor like TfIdfCountVectorizer will not work. Here's a SO question and answer on converting categories into numerical feature data that you can append to the right.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.