简体繁体中英

Categorical Feature Encoding as Enum for Scikit-Learn

原文 2018-06-14 08:57:58 1 1 python/ encoding/ enums/ scikit-learn/ h2o

I am currently trying to preprocess a very large dataset with a lot of categorical features for Scikit-Learns' RandomForest Model (Regression). The nature of the categorical data requires to not have any ordinality added through encoding schemes. The H2o ML-Framework ( Link ) offers of enum -encoding which would suite perfectly for my data. However I rely on Scikit-Learns RandomForest.

Is anyone aware of some enum -encoding for Scikit-Learn Models? (One-Hot-Encoding is not an option)

Thanks in Advance!

1 answers

There is only label-encoding, LabelEncoder , together with OHE available in sklearn. However, it does not provide the functionality that you want, as categories are simply encoded as integers and this is meaningful for ordinal categories only, I believe. I believe, in sklearn it is left up to models to implement such enum category treatment (because there are many models in sklearn and most of them would not be able to benefit from such encoding).

I think, LightGBM claims here that it implements internally such type of category treatment, but i'm actually not 100% sure if that is true. The advantage is that they have both RF and GBM tree builders, so you cab easily switch between those and it is faster than sklearn implementation.

Note also that CatBoost has a reach toolkit for internal category encoding, but I have zero experience with it so far.

Feature selection using scikit-learn on categorical features

Impute categorical missing values in scikit-learn

Handling categorical features using scikit-learn

Using scikit-Learn for a multiplicative, categorical model

Feature preprocessing of both continuous and categorical variables (of integer type) with scikit-learn

Backtracking categorical features from one-hot-encoding in scikit-learn?

Text Feature Extraction using scikit-learn

Feature selection using scikit-learn

Scikit-learn feature selection for regression data

Resize HOG feature for Scikit-Learn classifier

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Feature selection using scikit-learn on categorical features Impute categorical missing values in scikit-learn Handling categorical features using scikit-learn Using scikit-Learn for a multiplicative, categorical model Feature preprocessing of both continuous and categorical variables (of integer type) with scikit-learn Backtracking categorical features from one-hot-encoding in scikit-learn? Text Feature Extraction using scikit-learn Feature selection using scikit-learn Scikit-learn feature selection for regression data Resize HOG feature for Scikit-Learn classifier

Related Tags

Categorical Feature Encoding as Enum for Scikit-Learn

Question

1 answers

solution1 1 2018-06-14 09:46:06

solution1
1 2018-06-14 09:46:06