Extracting text features from a dataframe and use them alongside other types of features (heterogenous data) for sklearn purposes: TypeError

Question

I am attempting to extract some features from a dataframe that looks akin to this:

feature1:float feature2:float feature3:string succeeded:boolean

I'm far from an expert on the topic but I attempted the following:

from sklearn.feature_extraction.text import CountVectorizer
import scipy as sp

vectorizer = CountVectorizer()
vectorizer.fit(small_df.feature3)
X = sp.sparse.hstack( (vectorizer.transform(small_df.feature3),
                 small_df[['feature1', 'feature2']),
                 format='csr')

X_columns = vectorizer.get_feature_names() + df[cols].columns.tolist()

However, I end up with the following error: TypeError: no supported conversion for types: (dtype('int64'), dtype('O'))

Any help would be appreciated!

Answer 1

Solution:

X = sp.sparse.hstack( (vectorizer.transform(small_df.name),
                 small_df[cols].values.astype(np.float)))

Extracting text features from a dataframe and use them alongside other types of features (heterogenous data) for sklearn purposes: TypeError

Question

1 answers

solution1
0 2020-10-09 08:17:22

Extracting text features from a dataframe and use them alongside other types of features (heterogenous data) for sklearn purposes: TypeError

Question

1 answers

solution1 0 2020-10-09 08:17:22

solution1
0 2020-10-09 08:17:22