Sklearn 的 SimpleImputer 在管道中无法检索插补值

Question

I am trying to print out all of the imputation values after fitting with SimpleImputer .在与SimpleImputer拟合后，我试图打印出所有的插补值。 When using SimpleImputer by itself, I can retrieve these from the instance's statistics_ attribute.当单独使用SimpleImputer时，我可以从实例的statistics_属性中检索这些。

This works fine:这工作正常：

s = SimpleImputer(strategy='mean')
s.fit(df[['feature_1', 'feature_2']])
print(s.statistics_)

However, I'm unable to do so when using SimpleImputer in a pipeline.但是，在管道中使用SimpleImputer时，我无法这样做。

This does not work:这不起作用：

numeric_transformer = Pipeline(steps=[
    ('simple_imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())])

categorical_features = ['feature_3']
categorical_transformer = Pipeline(steps=[
    ('simple_imputer', SimpleImputer(strategy='most_frequent')),
    ('one_hot', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', RandomForestClassifier(n_estimators=100))])

clf.fit(df[numeric_features + categorical_features], df['target'])

print(clf.named_steps['preprocessor'].transformers[0][1].named_steps['simple_imputer'].statistics_)

I get the following error:我收到以下错误：

AttributeError                            Traceback (most recent call last)
<ipython-input-523-7390eac0d9d6> in <module>
     19 clf.fit(df[numeric_features + categorical_features], df['target'])
     20 
---> 21 print(clf.named_steps['preprocessor'].transformers[0][1].named_steps['simple_imputer'].statistics_)

AttributeError: 'SimpleImputer' object has no attribute 'statistics_

I believe I am grabbing the correct instance of the fitted SimpleImputer object.我相信我正在获取已安装的SimpleImputer object 的正确实例。 Why can't I retrieve its statistics_ attribute to print out the imputation values?为什么我不能检索它的statistics_属性来打印出插补值？

Answer 1

I find it easier to use 'dot' notation when working with sklearn pipelines, not least because you get autocomplete to help you navigate the structure/attributes of the pipeline.我发现在使用sklearn管道时使用“点”表示法更容易，尤其是因为您可以获得自动完成功能来帮助您导航管道的结构/属性。 It also has the added bonus (in my opinion), of being more readable.它还具有额外的好处（在我看来），更具可读性。

You can use the following line to access the statistics_ attribute of the SimpleImputer :您可以使用以下行来访问SimpleImputer的statistics_属性：

imputation_vals = (
    clf
    .named_steps
    .preprocessor
    .named_transformers_
    .num
    .named_steps
    .simple_imputer.statistics_
)

Sklearn 的 SimpleImputer 在管道中无法检索插补值

问题描述

1 个解决方案

解决方案1
-1 已采纳 2020-04-05 08:11:20

Sklearn 的 SimpleImputer 在管道中无法检索插补值

问题描述

1 个解决方案

解决方案1 -1 已采纳 2020-04-05 08:11:20

解决方案1
-1 已采纳 2020-04-05 08:11:20