简体   繁体   English

在 sklearn 的管道中使用 LabelEncoder 给出:fit_transform 需要 2 个位置参数,但给出了 3 个

[英]Using a LabelEncoder in sklearn's Pipeline gives: fit_transform takes 2 positional arguments but 3 were given

I've been trying to run some ML code but I keep faltering at the fitting stage after running my pipeline.我一直在尝试运行一些 ML 代码,但在运行我的管道后,我在拟合阶段一直步履蹒跚。 I've looked around on various forums to not much avail.我在各种论坛上环顾四周,但无济于事。 What I've discovered is that some people say you can't use LabelEncoder within a pipeline.我发现有些人说你不能在管道中使用 LabelEncoder。 I'm not sure how true that is.我不确定这有多真实。 If anyone has any insights on the matter I'd be very happy to hear them.如果有人对此事有任何见解,我会很高兴听到他们的消息。

I keep getting this error:我不断收到此错误:

TypeError: fit_transform() takes 2 positional arguments but 3 were given

And so I'm not sure if the problem is from me or from python.所以我不确定问题是来自我还是来自 python。 Here's my code:这是我的代码:

data = pd.read_csv("ks-projects-201801.csv",
                   index_col="ID",
                   parse_dates=["deadline","launched"],
                   infer_datetime_format=True)

var = list(data)

data = data.drop(labels=[1014746686,1245461087, 1384087152, 1480763647, 330942060, 462917959, 69489148])
missing = [i for i in var if data[i].isnull().any()]
data = data.dropna(subset=missing,axis=0)
le = LabelEncoder()
oe = OrdinalEncoder()
oh = OneHotEncoder()
y = [i for i in var if i=="state"]
y = data[var.pop(8)]

p,p.index = pd.Series(le.fit_transform(y)),y.index
q = pd.read_csv("y.csv",index_col="ID")["0"]
label_y = le.fit_transform(y)

x = data[var]

obj_feat = x.select_dtypes(include="object")
dat_feat = x.select_dtypes(include="datetime64[ns]")
dat_feat = dat_feat.assign(dmonth=dat_feat.deadline.dt.month.astype("int64"),
                           dyear = dat_feat.deadline.dt.year.astype("int64"),
                           lmonth=dat_feat.launched.dt.month.astype("int64"),
                           lyear=dat_feat.launched.dt.year.astype("int64"))
dat_feat = dat_feat.drop(labels=["deadline","launched"],axis=1)
num_feat = x.select_dtypes(include=["int64","float64"])

u = dict(zip(list(obj_feat),[len(obj_feat[i].unique()) for i in obj_feat]))
le_obj = [i for i in u if u[i]<10]
oh_obj = [i for i in u if u[i]<20 and u[i]>10]
te_obj = [i for i in u if u[i]>20 and u[i]<25]
cb_obj = [i for i in u if u[i]>100]

# Pipeline time
#Impute and encode

strat = ["constant","most_frequent","mean","median"]
sc = StandardScaler()
oh_unk = "ignore"
encoders = [LabelEncoder(),
            OneHotEncoder(handle_unknown=oh_unk),
            TargetEncoder(),
            CatBoostEncoder()]

#num_trans = Pipeline(steps=[("imp",SimpleImputer(strategy=strat[2])),
num_trans = Pipeline(steps=[("sc",sc)])
#obj_imp = Pipeline(steps=[("imp",SimpleImputer(strategy=strat[1]))])
oh_enc = Pipeline(steps=[("oh_enc",encoders[1])])
te_enc = Pipeline(steps=[("te_enc",encoders[2])])
cb_enc = Pipeline(steps=[("cb_enc",encoders[0])])

trans = ColumnTransformer(transformers=[
                                        ("num",num_trans,list(num_feat)+list(dat_feat)),
                                        #("obj",obj_imp,list(obj_feat)),
                                        ("onehot",oh_enc,oh_obj),
                                        ("target",te_enc,te_obj),
                                        ("catboost",cb_enc,cb_obj)
                                        ])

models = [RandomForestClassifier(random_state=0),
          KNeighborsClassifier(),
          DecisionTreeClassifier(random_state=0)]

model = models[2]

print("Check 4")

# Chaining it all together
run = Pipeline(steps=[("Transformation",trans),("Model",model)])

x = pd.concat([obj_feat,dat_feat,num_feat],axis=1)
print("Check 5")
run.fit(x,p)

It runs fine until run.fit where it throws an error.它运行良好,直到 run.fit 抛出错误。 I'd love to hear any advice anyone might have, and any possible ways to resolve this problem would also be greatly appreciated!我很想听听任何人可能有的任何建议,并且任何解决此问题的可能方法也将不胜感激! Thank you.谢谢你。

The problem is the same as spotted in this answer , but with a LabelEncoder in your case.问题与此答案中发现的问题相同,但在您的情况下使用LabelEncoder The LabelEncoder 's fit_transform method takes: LabelEncoderfit_transform方法采用:

def fit_transform(self, y):
    """Fit label encoder and return encoded labels
    ...

Whereas Pipeline is expecting that all its transformers are taking three positional arguments fit_transform(self, X, y) .Pipeline期望它的所有转换器都采用三个位置参数fit_transform(self, X, y)

You could make a custom transformer as in the aforementioned answer, however, a LabelEncoder should not be used as a feature transformer .您可以按照上述答案制作自定义转换器,但是,不应将LabelEncoder用作特征转换器 An extensive explanation on why can be seen in LabelEncoder for categorical features?关于为什么可以在LabelEncoder 中看到分类特征的广泛解释 . . So I'd recommend not using a LabelEcoder and using some other bayesian encoders if the amount of features gets too high such as the TargetEncoder which you also have in the list of encoders.因此,如果特征数量过多,我建议不要使用LabelEcoder并使用其他一些贝叶斯编码器,例如编码器列表中的TargetEncoder

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Sklearn-FeatureUnion-变形金刚:TypeError:fit_transform()接受2个位置参数,但给出了3个 - Sklearn - FeatureUnion - Transformer: TypeError: fit_transform() takes 2 positional arguments but 3 were given sklearn.compose.ColumnTransformer:fit_transform() 需要 2 个位置参数,但给出了 3 个 - sklearn.compose.ColumnTransformer: fit_transform() takes 2 positional arguments but 3 were given fit_transform() 需要 2 个位置参数,但 3 个是通过 LabelBinarizer 给出的 - fit_transform() takes 2 positional arguments but 3 were given with LabelBinarizer "TypeError: fit_transform() 接受 2 个位置参数,但给出了 3 个" - TypeError: fit_transform() takes 2 positional arguments but 3 were given 如何解决“ TypeError:fit_transform()需要2个位置参数,但给出了3个” - How to fix “TypeError: fit_transform() takes 2 positional arguments but 3 were given” sklearn 管道错误 - fit() 采用 1 个位置参数,但给出了 3 个 - sklearn pipeline error - fit() takes 1 positional argument but 3 were given TypeError:_transform()需要2个位置参数,但是给出了3个 - TypeError: _transform() takes 2 positional arguments but 3 were given 在fit_transform之后获取sklearn.LabelEncoder()映射 - get sklearn.LabelEncoder() mappings after fit_transform scikit-learn 管道 _transform() 采用“x”位置 arguments 但给出了“y” - scikit-learn pipeline _transform() takes 'x' positional arguments but 'y' were given 在 sklearn 中创建自定义变压器时出错 - 需要 2 个位置 arguments 但给出了 3 个 - Error creating a custom transformer in sklearn - takes 2 positional arguments but 3 were given
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM