[英]Sklearn's SimpleImputer can't retrieve imputation values when in pipeline
I am trying to print out all of the imputation values after fitting with SimpleImputer
.在与
SimpleImputer
拟合后,我试图打印出所有的插补值。 When using SimpleImputer
by itself, I can retrieve these from the instance's statistics_
attribute.当单独使用
SimpleImputer
时,我可以从实例的statistics_
属性中检索这些。
This works fine:这工作正常:
s = SimpleImputer(strategy='mean')
s.fit(df[['feature_1', 'feature_2']])
print(s.statistics_)
However, I'm unable to do so when using SimpleImputer
in a pipeline.但是,在管道中使用
SimpleImputer
时,我无法这样做。
This does not work:这不起作用:
numeric_transformer = Pipeline(steps=[
('simple_imputer', SimpleImputer(strategy='mean')),
('scaler', StandardScaler())])
categorical_features = ['feature_3']
categorical_transformer = Pipeline(steps=[
('simple_imputer', SimpleImputer(strategy='most_frequent')),
('one_hot', OneHotEncoder(handle_unknown='ignore'))])
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)])
clf = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', RandomForestClassifier(n_estimators=100))])
clf.fit(df[numeric_features + categorical_features], df['target'])
print(clf.named_steps['preprocessor'].transformers[0][1].named_steps['simple_imputer'].statistics_)
I get the following error:我收到以下错误:
AttributeError Traceback (most recent call last)
<ipython-input-523-7390eac0d9d6> in <module>
19 clf.fit(df[numeric_features + categorical_features], df['target'])
20
---> 21 print(clf.named_steps['preprocessor'].transformers[0][1].named_steps['simple_imputer'].statistics_)
AttributeError: 'SimpleImputer' object has no attribute 'statistics_
I believe I am grabbing the correct instance of the fitted SimpleImputer
object.我相信我正在获取已安装的
SimpleImputer
object 的正确实例。 Why can't I retrieve its statistics_
attribute to print out the imputation values?为什么我不能检索它的
statistics_
属性来打印出插补值?
I find it easier to use 'dot' notation when working with sklearn
pipelines, not least because you get autocomplete to help you navigate the structure/attributes of the pipeline.我发现在使用
sklearn
管道时使用“点”表示法更容易,尤其是因为您可以获得自动完成功能来帮助您导航管道的结构/属性。 It also has the added bonus (in my opinion), of being more readable.它还具有额外的好处(在我看来),更具可读性。
You can use the following line to access the statistics_
attribute of the SimpleImputer
:您可以使用以下行来访问
SimpleImputer
的statistics_
属性:
imputation_vals = (
clf
.named_steps
.preprocessor
.named_transformers_
.num
.named_steps
.simple_imputer.statistics_
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.