简体   繁体   中英

scikit-learn, categorical (but numerical) features in Linear Regression

I'm using Linear Regression in scikit-learn and my dataset contains some cateogorical but numerical features. I mean that there are features such as the value of the district where the house is that are expressed by an integer number between 1 and 7: the more this number is high, the more the house is of value. Should I preprocess a feature that expresses a category (the district of the city) using numbers before Linear Regression with encoders such as OneHotEncoder? Or is it compulsory only when the category is expressed by characters? Thank you in advance..

If I understand correctly, you don't need to one hot encode these since they are ordinal, ie there is meaning in the order. If the numbers were product codes, for example, and there was no sense of 7 being "better than" or "more than" 4, then you would want to one-hot encode those variables, but in this case you would be losing information by one-hot encoding.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM