ColumnTransformer fit_transform 不适用于管道

Question

I am writing a pipeline with custom transformer.我正在编写带有自定义变压器的管道。 When calling fit_transform of categorical pipeline I am getting the desired result but when calling fit_transform of ColumnTransformer, whatever I have initialised in init of custom transformer is getting lost.当调用分类管道的 fit_transform 时，我得到了想要的结果，但是当调用 ColumnTransformer 的 fit_transform 时，我在自定义转换器的init中初始化的任何内容都会丢失。 Note: not including code of numericalTransformer for readability注意：为了可读性，不包括 numericTransformer 的代码

class categoryTransformer(BaseEstimator, TransformerMixin):
def __init__(self, use_dates=['year', 'month', 'day']):
    self._use_dates = use_dates
    print('==========>',self._use_dates)
def fit(self, X, y=None):
    return self

def get_year(self, obj):
    return str(obj)[:4]

def get_month(self, obj):
    return str(obj)[4:6]

def get_day(self, obj):
    return str(obj)[6:8]

def create_boolean(self, obj):
    if obj == '0':
        return 'No'
    else:
        return 'Yes'
    
def transform(self, X, y=None):
    print(self._use_dates)

     for spec in self._use_dates:
         print(spec)
         exec("X.loc[:,'{}'] = X['date'].apply(self.get_{})".format(spec, spec))
    
    X = X.drop('date', axis=1)
    X.loc[:,'yr_renovated'] = X['yr_renovated'].apply(self.create_boolean)
    X.loc[:, 'view'] = X['view'].apply(self.create_boolean)
    return X.values

cat_pipe = Pipeline([
('cat_transform', categoryTransformer()),
('one_hot', OneHotEncoder(sparse=False))])

num_pipe = Pipeline([
('num_transform', numericalTransformer()),
('imputer', SimpleImputer(strategy = 'median')),
('std_scaler', StandardScaler())])

full_pipe = ColumnTransformer([
('num', num_pipe, numerical_features),
('cat', cat_pipe, categorical_features)])

cat_pipe.fit_transform(data[categorical_features])#working fine
df2 = full_pipe.fit_transform(X_train)# __init__ initialisation lost

"output"
==========> ['year', 'month', 'day']
['year', 'month', 'day']
year
month
day
==========> None
None

After that long traceback that I am not able to debug.在我无法调试的那个漫长的回溯之后。 Workaround is if I can create use_dates=['year', 'month', 'day'] in transform function itself but I want to understand why this is happening.解决方法是如果我可以在变换 function 本身中创建 use_dates=['year', 'month', 'day'] 但我想了解为什么会这样。

Answer 1

The parameters of __init__ need to have the same names as the attributes that get set (so use_dates and _use_dates is the problem). __init__的参数需要与设置的属性具有相同的名称（因此use_dates和_use_dates是问题所在）。

This is required for cloning to work properly, and ColumnTransformer clones all its transformers before fitting.这是克隆正常工作所必需的， ColumnTransformer在安装之前会克隆其所有转换器。

https://scikit-learn.org/stable/developers/develop.html#instantiation https://scikit-learn.org/stable/developers/develop.html#instantiation

ColumnTransformer fit_transform 不适用于管道

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-08-07 01:36:19

ColumnTransformer fit_transform 不适用于管道

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-08-07 01:36:19

解决方案1
0 已采纳 2020-08-07 01:36:19