Scikit-learn 具有多个值的输入

Question

Is there a way for a Scikit-learn Imputer to look for and replace multiple values which are considered "missing values"?有没有办法让 Scikit-learn Imputer 查找并替换被认为是“缺失值”的多个值？

For example, I would like to do something like例如，我想做类似的事情

imp = Imputer(missing_values=(7,8,9))

But according to the docs, the missing_values parameter only accepts a single integer:但是根据文档，missing_values 参数只接受一个 integer：

missing_values: integer or “NaN”, optional (default=”NaN”) missing_values：integer 或“NaN”，可选（默认=“NaN”）

The placeholder for the missing values.缺失值的占位符。 All occurrences of missing_values will be imputed.所有出现的 missing_values 都将被估算。 For missing values encoded as np.nan, use the string value “NaN”.对于编码为 np.nan 的缺失值，使用字符串值“NaN”。

Answer 1

Why not to do this manually in your original dataset?为什么不在您的原始数据集中手动执行此操作？ Assuming you are using pd.DataFrame you can do the following:假设您使用的是pd.DataFrame ，您可以执行以下操作：

import numpy as np
import pandas as pd
from sklearn.preprocessing import Imputer

df = pd.DataFrame({'A': [1, 2, 3, 8], 'B': [1, 2, 5, 3]})
df_new = df.replace([1, 2], np.nan)
df_imp = Imputer().fit_transform(df_new)

This results in df_imp :这导致df_imp ：

array([[ 5.5,  4. ],
   [ 5.5,  4. ],
   [ 3. ,  5. ],
   [ 8. ,  3. ]])

If you want to make this a part of a pipeline, you would just need to implement a custom transformer with a similar logic.如果你想让它成为管道的一部分，你只需要实现一个具有类似逻辑的自定义转换器。

Answer 2

You could chain multiple imputers in a pipeline, but that might become hectic pretty soon and I'm not sure how efficient that is.您可以在管道中链接多个输入器，但这可能很快就会变得繁忙，我不确定它的效率如何。

pipeline = make_pipeline(
    SimpleImputer(missing_values=7, strategy='constant', fill_value=10),
    SimpleImputer(missing_values=8, strategy='constant', fill_value=10),
    SimpleImputer(missing_values=9, strategy='constant', fill_value=10)
)

Scikit-learn 具有多个值的输入

问题描述

2 个解决方案

解决方案1
4 已采纳 2018-06-11 22:38:28

解决方案2
0 2022-03-14 05:58:30

Scikit-learn 具有多个值的输入

问题描述

2 个解决方案

解决方案1 4 已采纳 2018-06-11 22:38:28

解决方案2 0 2022-03-14 05:58:30

解决方案1
4 已采纳 2018-06-11 22:38:28

解决方案2
0 2022-03-14 05:58:30