简体   繁体   中英

TypeError: fit_transform() takes 2 positional arguments but 3 were given

I have pandas DataFrame df . I want to encode continuous and categorical features of df using different encoders. I find it very comfortable to use make_column_transformer , but the code shown below fails with LabelEncoder() , but works fine with OneHotEncoder(handle_unknown='ignore')) . The error message is:

TypeError: fit_transform() takes 2 positional arguments but 3 were given

It's not clear to me how to fix this issue.

The code:

from sklearn.compose import make_column_transformer
from sklearn.preprocessing import RobustScaler, OneHotEncoder, LabelEncoder

continuous_features = ['COL1','COL2']       
categorical_features = ['COL3','COL4']

column_trans = make_column_transformer(
    (categorical_features,LabelEncoder()),
    (continuous_features, RobustScaler()))

X_enc = column_trans.fit_transform(df)

According to https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_transformer.html .

make_column_transformer(
...     (StandardScaler(), ['numerical_column']),
...     (OneHotEncoder(), ['categorical_column']))

So for your case:

from sklearn.compose import make_column_transformer
from sklearn.preprocessing import RobustScaler, OneHotEncoder, LabelEncoder

continuous_features = ['COL1','COL2']       
categorical_features = ['COL3','COL4']

column_trans = make_column_transformer(
    (OneHotEncoder(), categorical_features),
    (RobustScaler(), continuous_features))

X_enc = column_trans.fit_transform(df)

If you want to use LabelEncoder() , you can only pass one column, not two!

Hope this helps.

Imo the problem here was that LabelEncoder could not (and still can't) be used within ColumnTransformer s or Pipeline s instances, because it is meant to only transform targets, while ColumnTransformer s and Pipeline s are intended to be used for feature transformation only.

This can be clearly seen from the signatures of methods .fit() , .transform() , .fit_transform() of the LabelEncoder class, which differ from the ones of "more-standard" transformers.

fit(y) vs fit(X[,y]) | transform(y) vs transform(X) | fit_transform(y) vs fit_transform(X[,y])

respectively for LabelEncoder-like transformers (ie transformers to be applied on target) and for transformers to be applied on features.

I've just posted an answer in full details at Why should LabelEncoder from sklearn be used only for the target variable? .

This said, if using LabelEncoder within a ColumnTransformer was allowed, you would have probably incurred in a problem described in the other answer and deriving from the fact that the input to LabelEncoder should be 1D (and btw in such cases both the ColumnTransformer constructor and the make_column_transformer method would have required the columns parameter to be passed as a string rather than as a list of string(s) ). A common use case for this can be seen in the usage of instances of classes meant to deal with texts, like CountVectorizer , which do require 1D inputs (see Sklearn custom transformers with pipeline: all the input array dimensions for the concatenation axis must match exactly , for instance).

Eventually, from sklearn version 0.20, an alternative to the LabelEncoder that can be used on feature vectors is the OrdinalEncoder .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM