简体   繁体   中英

How to use OneHotEncoder categorical_features

I am having trouble encoding only categorical columns using OneHotEncoder and leaving out continuous columns. The encoder encodes all columns no matter what I specify in the categorical_features. For example:

enc = preprocessing.OneHotEncoder()
enc.fit([[0, 40, 3], [1, 50, 0], [0, 45, 1], [1, 30, 2]])
OneHotEncoder(categorical_features=[0,2], 
   handle_unknown='error', n_values='auto', sparse=True)
print enc.n_values_
print enc.feature_indices_
enc.transform([[0, 45, 3]]).toarray()

I only want to encode column 1 and 3, leaving the middle column (values 40, 50, 45, 30) as continuous values. So I specify categorical_features=[0,2], but no matter what I do, the output of this code is still:

[ 2 51  4]
[ 0  2 53 57]
Out[129]:
array([[ 1.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.]])

Why do you call OneHotEncoder constructor twise? enc has been created by default constructor, so for enc you have categorical_features='all' (all feature are categorical). As I understand you need somthing like this:

enc = OneHotEncoder(categorical_features=[0,2],
    handle_unknown='error', n_values='auto', sparse=True)
enc.fit([[0, 40, 3], [1, 50, 0], [0, 45, 1], [1, 30, 2]])
print(enc.n_values_)
print(enc.feature_indices_)
enc.transform([[0, 45, 3]]).toarray()

and you will have

[2 4]
[0 2 6]
Out[23]: array([[  1.,   0.,   0.,   0.,   0.,   1.,  45.]])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM