[英]ColumnTransformer fit_transform not working with pipeline
I am writing a pipeline with custom transformer.我正在编写带有自定义变压器的管道。 When calling fit_transform of categorical pipeline I am getting the desired result but when calling fit_transform of ColumnTransformer, whatever I have initialised in init of custom transformer is getting lost.当调用分类管道的 fit_transform 时,我得到了想要的结果,但是当调用 ColumnTransformer 的 fit_transform 时,我在自定义转换器的init中初始化的任何内容都会丢失。 Note: not including code of numericalTransformer for readability注意:为了可读性,不包括 numericTransformer 的代码
class categoryTransformer(BaseEstimator, TransformerMixin):
def __init__(self, use_dates=['year', 'month', 'day']):
self._use_dates = use_dates
print('==========>',self._use_dates)
def fit(self, X, y=None):
return self
def get_year(self, obj):
return str(obj)[:4]
def get_month(self, obj):
return str(obj)[4:6]
def get_day(self, obj):
return str(obj)[6:8]
def create_boolean(self, obj):
if obj == '0':
return 'No'
else:
return 'Yes'
def transform(self, X, y=None):
print(self._use_dates)
for spec in self._use_dates:
print(spec)
exec("X.loc[:,'{}'] = X['date'].apply(self.get_{})".format(spec, spec))
X = X.drop('date', axis=1)
X.loc[:,'yr_renovated'] = X['yr_renovated'].apply(self.create_boolean)
X.loc[:, 'view'] = X['view'].apply(self.create_boolean)
return X.values
cat_pipe = Pipeline([
('cat_transform', categoryTransformer()),
('one_hot', OneHotEncoder(sparse=False))])
num_pipe = Pipeline([
('num_transform', numericalTransformer()),
('imputer', SimpleImputer(strategy = 'median')),
('std_scaler', StandardScaler())])
full_pipe = ColumnTransformer([
('num', num_pipe, numerical_features),
('cat', cat_pipe, categorical_features)])
cat_pipe.fit_transform(data[categorical_features])#working fine
df2 = full_pipe.fit_transform(X_train)# __init__ initialisation lost
"output"
==========> ['year', 'month', 'day']
['year', 'month', 'day']
year
month
day
==========> None
None
After that long traceback that I am not able to debug.在我无法调试的那个漫长的回溯之后。 Workaround is if I can create use_dates=['year', 'month', 'day'] in transform function itself but I want to understand why this is happening.解决方法是如果我可以在变换 function 本身中创建 use_dates=['year', 'month', 'day'] 但我想了解为什么会这样。
The parameters of __init__
need to have the same names as the attributes that get set (so use_dates
and _use_dates
is the problem). __init__
的参数需要与设置的属性具有相同的名称(因此use_dates
和_use_dates
是问题所在)。
This is required for cloning to work properly, and ColumnTransformer
clones all its transformers before fitting.这是克隆正常工作所必需的, ColumnTransformer
在安装之前会克隆其所有转换器。
https://scikit-learn.org/stable/developers/develop.html#instantiation https://scikit-learn.org/stable/developers/develop.html#instantiation
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.