简体   繁体   中英

Using Scikit-Learn's SVR, how do you combine categorical and continuous features in predicting the target?

I want to use support vector machine to solve a regression problem to predict the income of teachers based on a few features which is a mixture of categorical and continuous. For example, I have [white, asian, hispanic, black], # years teaching, and years of education.

For the categorical, I utilized sci-kit's preprocessing module, and hotcoded the 4 races. In this case, it would look something like [1,0,0,0] for a white teacher, and hence I have an array of {[1,0,0,0], [0,1,0,0],...[0,0,1,0], [1,0,0,0]} representing the races of each teacher encoded for SVR. I can perform a regression with just race vs. income, ie:

clf= SVR(C=1.0)
clf.fit(racearray, income) 

I can also perform a regression using the quantitative features as well. However, I don't know how to combine the features together, ie

continousarray(zip(yearsteaching,yearseduction))
clf.fit((racearray, continousarray), income)

You can use scikit-learn's OneHotEncoder . If your data are in numpy array "racearray" and the columns are

[ contionus_feature1, contious_feature2, categorical, continous_feature3]

your code should look like (keep in mind that numpy enumeration starts with 0)

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(categorical_features=[2])
race_encoded = enc.fit_transform(racearay)

you then can have a look your race_encode array as usual and use it in SVR as

clf= SVR(C=1.0)
clf.fit(race_encoded, income) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM