简体   繁体   English

使用Scikit-Learn的SVR,如何将分类和连续特征结合起来预测目标?

[英]Using Scikit-Learn's SVR, how do you combine categorical and continuous features in predicting the target?

I want to use support vector machine to solve a regression problem to predict the income of teachers based on a few features which is a mixture of categorical and continuous. 我想使用支持向量机来解决回归问题,根据一些混合了分类和连续的特征来预测教师的收入。 For example, I have [white, asian, hispanic, black], # years teaching, and years of education. 例如,我有[白人,亚洲人,西班牙裔,黑人],#年教学和多年教育。

For the categorical, I utilized sci-kit's preprocessing module, and hotcoded the 4 races. 对于分类,我使用了sci-kit的预处理模块,并对4场比赛进行了热门编码。 In this case, it would look something like [1,0,0,0] for a white teacher, and hence I have an array of {[1,0,0,0], [0,1,0,0],...[0,0,1,0], [1,0,0,0]} representing the races of each teacher encoded for SVR. 在这种情况下,对于白人教师来说它看起来像[1,0,0,0],因此我有一个{[1,0,0,0],[0,1,0,0]的数组,... [0,0,1,0],[1,0,0,0]}表示为SVR编码的每位教师的比赛。 I can perform a regression with just race vs. income, ie: 我可以只用种族与收入进行回归,即:

clf= SVR(C=1.0)
clf.fit(racearray, income) 

I can also perform a regression using the quantitative features as well. 我也可以使用定量特征进行回归。 However, I don't know how to combine the features together, ie 但是,我不知道如何将这些功能组合在一起,即

continousarray(zip(yearsteaching,yearseduction))
clf.fit((racearray, continousarray), income)

You can use scikit-learn's OneHotEncoder . 你可以使用scikit-learn的OneHotEncoder If your data are in numpy array "racearray" and the columns are 如果你的数据是numpy数组“racearray”,列是

[ contionus_feature1, contious_feature2, categorical, continous_feature3] [contionus_feature1,contious_feature2,categorical,continous_feature3]

your code should look like (keep in mind that numpy enumeration starts with 0) 你的代码应该是这样的(请记住,numpy枚举从0开始)

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(categorical_features=[2])
race_encoded = enc.fit_transform(racearay)

you then can have a look your race_encode array as usual and use it in SVR as 然后你可以像往常一样查看你的race_encode数组,并在SVR中使用它

clf= SVR(C=1.0)
clf.fit(race_encoded, income) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM