简体   繁体   English

ColumnTransformer fit_transform 不适用于管道

[英]ColumnTransformer fit_transform not working with pipeline

I am writing a pipeline with custom transformer.我正在编写带有自定义变压器的管道。 When calling fit_transform of categorical pipeline I am getting the desired result but when calling fit_transform of ColumnTransformer, whatever I have initialised in init of custom transformer is getting lost.当调用分类管道的 fit_transform 时,我得到了想要的结果,但是当调用 ColumnTransformer 的 fit_transform 时,我在自定义转换器的init中初始化的任何内容都会丢失。 Note: not including code of numericalTransformer for readability注意:为了可读性,不包括 numericTransformer 的代码

class categoryTransformer(BaseEstimator, TransformerMixin):
def __init__(self, use_dates=['year', 'month', 'day']):
    self._use_dates = use_dates
    print('==========>',self._use_dates)
def fit(self, X, y=None):
    return self

def get_year(self, obj):
    return str(obj)[:4]

def get_month(self, obj):
    return str(obj)[4:6]

def get_day(self, obj):
    return str(obj)[6:8]

def create_boolean(self, obj):
    if obj == '0':
        return 'No'
    else:
        return 'Yes'
    
def transform(self, X, y=None):
    print(self._use_dates)

     for spec in self._use_dates:
         print(spec)
         exec("X.loc[:,'{}'] = X['date'].apply(self.get_{})".format(spec, spec))
    
    X = X.drop('date', axis=1)
    X.loc[:,'yr_renovated'] = X['yr_renovated'].apply(self.create_boolean)
    X.loc[:, 'view'] = X['view'].apply(self.create_boolean)
    return X.values

cat_pipe = Pipeline([
('cat_transform', categoryTransformer()),
('one_hot', OneHotEncoder(sparse=False))])

num_pipe = Pipeline([
('num_transform', numericalTransformer()),
('imputer', SimpleImputer(strategy = 'median')),
('std_scaler', StandardScaler())])

full_pipe = ColumnTransformer([
('num', num_pipe, numerical_features),
('cat', cat_pipe, categorical_features)])

cat_pipe.fit_transform(data[categorical_features])#working fine
df2 = full_pipe.fit_transform(X_train)# __init__ initialisation lost

"output"
==========> ['year', 'month', 'day']
['year', 'month', 'day']
year
month
day
==========> None
None

After that long traceback that I am not able to debug.在我无法调试的那个漫长的回溯之后。 Workaround is if I can create use_dates=['year', 'month', 'day'] in transform function itself but I want to understand why this is happening.解决方法是如果我可以在变换 function 本身中创建 use_dates=['year', 'month', 'day'] 但我想了解为什么会这样。

The parameters of __init__ need to have the same names as the attributes that get set (so use_dates and _use_dates is the problem). __init__的参数需要与设置的属性具有相同的名称(因此use_dates_use_dates是问题所在)。

This is required for cloning to work properly, and ColumnTransformer clones all its transformers before fitting.这是克隆正常工作所必需的, ColumnTransformer在安装之前会克隆其所有转换器。

https://scikit-learn.org/stable/developers/develop.html#instantiation https://scikit-learn.org/stable/developers/develop.html#instantiation

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 ColumnTransformer 在 sklearn 中尝试 fit_transform 管道时生成 TypeError - ColumnTransformer generating a TypeError when trying to fit_transform pipeline in sklearn sklearn 中的 ColumnTransformer 实现没有定义 fit 方法,它只是自动调用 fit_transform? - ColumnTransformer implementation in sklearn doesn't have a fit method defined, it just automatically calls fit_transform? sklearn.compose.ColumnTransformer:fit_transform() 需要 2 个位置参数,但给出了 3 个 - sklearn.compose.ColumnTransformer: fit_transform() takes 2 positional arguments but 3 were given 为什么fit_transform在此sklearn Pipeline示例中不起作用? - Why doesn't fit_transform work in this sklearn Pipeline example? 使用 fit_transform() 和 transform() - Using fit_transform() and transform() 管道中的项目何时调用fit_transform(),何时调用transform()? (scikit学习,管道) - When do items in the Pipeline call fit_transform(), and when do they call transform()? (scikit-learn, Pipeline) Scikit fit_transform、ColumnTransformer 和 OneHotEncoder 的目的不是对分类数据进行编码,那为什么要用在数值上 - Isn't the purpose of Scikit fit_transform, ColumnTransformer and OneHotEncoder to code categorical data, so why is it used on numerical values fit_transform、transform 和 TfidfVectorizer 的工作原理 - How fit_transform, transform and TfidfVectorizer works 用训练数据进行fit_transform并通过测试进行变换 - fit_transform with the training data and transform with the testing 在 sklearn 的管道中使用 LabelEncoder 给出:fit_transform 需要 2 个位置参数,但给出了 3 个 - Using a LabelEncoder in sklearn's Pipeline gives: fit_transform takes 2 positional arguments but 3 were given
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM