简体   繁体   中英

ValueError: Shape mismatch: if categories is an array, The error is not resolved even after specifying the columns as indexes

    trf1=ColumnTransformer([("Infuse_val",SimpleImputer(strategy="mean"),[0])],remainder="passthrough")
    trf4=ColumnTransformer([("One_hot",OneHotEncoder(sparse=False,handle_unknown="ignore"),[1,4])],remainder="passthrough")
    trf2=ColumnTransformer([("Ord_encode",OrdinalEncoder(categories=["Strong","Mild"]),[3])],remainder="passthrough")
    trf3=ColumnTransformer([("scale",StandardScaler(),[0,2])],remainder="passthrough")
    pipe = Pipeline([
        ('trf1',trf1),
        ('trf2',trf2),
        ('trf3',trf3),
        ('trf4',trf4),
    ])
    pipe.fit(x_train,y_tarin)

Error

ValueError: Shape mismatch: if categories is an array, it has to be of shape (n_features,).

The table is

在此处输入图像描述

I don't understand what's the error here in my code?

The error isn't about the column transformers, it's about the OrdinalEncoder . categories needs to be a list of lists: for each column, the list of categories in that column. Since you have just one column, categories=[["Strong","Mild"]] should work.

With just two categories, most subsequent algorithms won't care which one is 0 or 1, so here you could just use the default auto .

Finally, you'll have problems with your column transformers. The change the order (and names) of the columns, so by the end of the pipeline, scaling columns 0 and 2 might not be the two numeric columns. The column order is predictable (transformers in order followed by passthrough), so you could manually keep track. But I would suggest a single column transformer with multiple pipelines instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM