简体   繁体   中英

Extracting text features from a dataframe and use them alongside other types of features (heterogenous data) for sklearn purposes: TypeError

I am attempting to extract some features from a dataframe that looks akin to this:

feature1:float feature2:float feature3:string succeeded:boolean

I'm far from an expert on the topic but I attempted the following:

from sklearn.feature_extraction.text import CountVectorizer
import scipy as sp

vectorizer = CountVectorizer()
vectorizer.fit(small_df.feature3)
X = sp.sparse.hstack( (vectorizer.transform(small_df.feature3),
                 small_df[['feature1', 'feature2']),
                 format='csr')

X_columns = vectorizer.get_feature_names() + df[cols].columns.tolist()

However, I end up with the following error: TypeError: no supported conversion for types: (dtype('int64'), dtype('O'))

Any help would be appreciated!

Solution:

X = sp.sparse.hstack( (vectorizer.transform(small_df.name),
                 small_df[cols].values.astype(np.float)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM