简体   繁体   中英

How to deal with combination of text and numeric features?

Looking at Kaggel's Job Salary Prediction , I see numeric features (like Category) and textual ones (like FullDescription).

How do I go about training on such data? I thought about vectorizing the text using TfidfTransformer , however it creates sparse matrix which many learning algorithms (such as RandomForestRegressor ) refuse to work with. Also, once I have the feature vector for the text, how do I combine it with other features?

Any pointers on how to work with such data?

Thanks!

我将首先独立地学习每个文本字段的tf-idf特征的线性模型,并将线性模型预测作为附加特征添加到其他特征,并在组合特征上训练ExtraTreesRegressorGradientBoostedTreeRegressor

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM