Building a predictive model with text data and other predictors

Question

I'm trying to build a predictive model (random forest, sgd, etc.) using scikit-learn and it seems like every model only allows you to fit text data such as

classifier.fit(X,Y)

...where Y is the target and X is a text feature vector (count_vec -> tf_idf). Is there any way to have a model which in addition to the text feature matrix also contains several categorical variables? Can I simply append them as new columns on the right side of X ?

Answer 1

You will need to convert categorical data first - simple appending of string categories to the number values from a feature extractor like TfIdfCountVectorizer will not work. Here's a SO question and answer on converting categories into numerical feature data that you can append to the right.

Building a predictive model with text data and other predictors

Question

1 answers

solution1
0 2019-08-23 19:52:47

Building a predictive model with text data and other predictors

Question

1 answers

solution1 0 2019-08-23 19:52:47

solution1
0 2019-08-23 19:52:47