在 scikit-learn 中填充缺失值后缺少列标签

Question

I was recently tried to fill missing data in a dataset after which the column labels in the dataset disappeared.我最近尝试在数据集中填充缺失的数据，之后数据集中的列标签消失了。 The dataset is was using looks like:正在使用的数据集如下所示：

	Make制作	Colour颜色	Odometer (KM)里程表 (KM)	Doors门
0 0	Honda本田	White白色的	35431.0 35431.0	4.0 4.0
1 1	BMW宝马	Blue蓝色的	192714.0 192714.0	5.0 5.0
2 2	Honda本田	White白色的	84714.0 84714.0	4.0 4.0
3 3	Toyota丰田	White白色的	154365.0 154365.0	4.0 4.0
4 4	Nissan日产	Blue蓝色的	181577.0 181577.0	3.0 3.0

The method I used is我使用的方法是

from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

cat_imputer=SimpleImputer(strategy='constant',fill_value='missing')
door_imputer=SimpleImputer(strategy='constant',fill_value=4)
num_imputer=SimpleImputer(strategy="mean")

cat_features=["Make","Colour"]
door_features=["Doors"]
num_features=["Odometer (KM)"]

imputer = ColumnTransformer([
    ("cat_imputer",cat_imputer,cat_features),
    ("door_imputer",door_imputer,door_features),
    ("num_imputer",num_imputer,num_features)
])

filled_x=imputer.fit_transform(X)
new_frame=pd.DataFrame(filled_x)

After this, if we view the new_frame it is在此之后，如果我们查看new_frame它是

	0 0	1 1	2 2	3 3
0 0	Honda本田	White白色的	4.0 4.0	35431.04.0 35431.04.0
1 1	BMW宝马	Blue蓝色的	5.0 5.0	192714.0 192714.0
2 2	Honda本田	White白色的	4.0 4.0	84714.0 84714.0
3 3	Toyota丰田	White白色的	4.0 4.0	154365.0 154365.0
4 4	Nissan日产	Blue蓝色的	3.0 3.0	181577.0 181577.0

So what happened here?那么这里发生了什么？

Why are the Doors and Odometer (KM) columns flipped?为什么门和里程表 (KM) 列会翻转？
Where did the column labels go?列标签去哪儿了？
How can I bring them back?我怎样才能把它们带回来？

Answer 1

because it generates a whole new filled_x array, which the type is numpy.ndarray , either you assign them back to the original dataframe or to a new dataframe equipt with old dataframe 's columns and index.因为它会生成一个全新的filled_x数组，其类型为numpy.ndarray ，您可以将它们分配回原始dataframe或分配给具有旧dataframe列和索引的新dataframe帧。

df
###
     Make Colour  Odometer (KM)  Doors
0   Honda  White        35431.0    4.0
1     BMW   Blue       192714.0    5.0
2   Honda    NaN        84714.0    4.0
3  Toyota  White       154365.0    4.0
4  Nissan   Blue       181577.0    NaN

I remodify your df with some NaN , so we can check if the imputer works or not.我用一些NaN重新修改了你的df ，所以我们可以检查 imputer 是否工作。

Imputation插补

Why are the Doors and Odometer (KM) columns flipped?为什么门和里程表 (KM) 列会翻转？

Your imputer 's orders differ from your original dataframe 's column order.您imputer的订单与原始dataframe的列顺序不同。

cat_imputer=SimpleImputer(strategy='constant',fill_value='missing')
door_imputer=SimpleImputer(strategy='constant',fill_value=4)
num_imputer=SimpleImputer(strategy="mean")

cat_features=["Make","Colour"]
door_features=["Doors"]
num_features=["Odometer (KM)"]

imputer = ColumnTransformer([
    ("cat_imputer",cat_imputer,cat_features),
    ("num_imputer",num_imputer,num_features),     ◀━┓
    ("door_imputer",door_imputer,door_features)   ◀━┛
])

filled_x=imputer.fit_transform(df)

filled_x
###
[['Honda' 'White' 35431.0 4.0]
 ['BMW' 'Blue' 192714.0 5.0]
 ['Honda' 'missing' 84714.0 4.0]
 ['Toyota' 'White' 154365.0 4.0]
 ['Nissan' 'Blue' 181577.0 4.0]]

During the processing, sklearn are dealing with array / ndarray在处理过程中， sklearn正在处理array / ndarray

type(filled_x)
###
numpy.ndarray

you use pd.DataFrame(filled_x) that made the ndarray -format result from imputer.fit_transform() putting into a brand new dataframe without specifying columns and index information.您使用pd.DataFrame(filled_x)将 imputer.fit_transform imputer.fit_transform() ) 的ndarray格式结果放入一个全新的dataframe框，而无需指定列和索引信息。

You may assign them like:您可以像这样分配它们：

new_frame = pd.DataFrame(filled_x, index=df.index, columns=df.columns)
new_frame
###

And this is a way(though it's not recommended) to assign the result to overwrite the old df这是一种分配结果以覆盖旧df的方法（尽管不推荐）

df.loc[:,:] = filled_x

Practically speaking, we might reserve df so you can try different imputation methods.实际上，我们可能会保留df以便您可以尝试不同的插补方法。

在 scikit-learn 中填充缺失值后缺少列标签

问题描述

1 个解决方案

解决方案1
0 2022-07-17 07:16:06

Imputation插补

在 scikit-learn 中填充缺失值后缺少列标签

问题描述

1 个解决方案

解决方案1 0 2022-07-17 07:16:06

Imputation插补

解决方案1
0 2022-07-17 07:16:06