[英]Missing column labels after filling missing values in scikit-learn
I was recently tried to fill missing data in a dataset after which the column labels in the dataset disappeared.我最近尝试在数据集中填充缺失的数据,之后数据集中的列标签消失了。 The dataset is was using looks like:
正在使用的数据集如下所示:
Make![]() |
Colour![]() |
Odometer (KM)![]() |
Doors![]() |
|
---|---|---|---|---|
0 ![]() |
Honda![]() |
White![]() |
35431.0 ![]() |
4.0 ![]() |
1 ![]() |
BMW![]() |
Blue![]() |
192714.0 ![]() |
5.0 ![]() |
2 ![]() |
Honda![]() |
White![]() |
84714.0 ![]() |
4.0 ![]() |
3 ![]() |
Toyota![]() |
White![]() |
154365.0 ![]() |
4.0 ![]() |
4 ![]() |
Nissan![]() |
Blue![]() |
181577.0 ![]() |
3.0 ![]() |
The method I used is我使用的方法是
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
cat_imputer=SimpleImputer(strategy='constant',fill_value='missing')
door_imputer=SimpleImputer(strategy='constant',fill_value=4)
num_imputer=SimpleImputer(strategy="mean")
cat_features=["Make","Colour"]
door_features=["Doors"]
num_features=["Odometer (KM)"]
imputer = ColumnTransformer([
("cat_imputer",cat_imputer,cat_features),
("door_imputer",door_imputer,door_features),
("num_imputer",num_imputer,num_features)
])
filled_x=imputer.fit_transform(X)
new_frame=pd.DataFrame(filled_x)
After this, if we view the new_frame it is在此之后,如果我们查看new_frame它是
0 ![]() |
1 ![]() |
2 ![]() |
3 ![]() |
|
---|---|---|---|---|
0 ![]() |
Honda![]() |
White![]() |
4.0 ![]() |
35431.04.0 ![]() |
1 ![]() |
BMW![]() |
Blue![]() |
5.0 ![]() |
192714.0 ![]() |
2 ![]() |
Honda![]() |
White![]() |
4.0 ![]() |
84714.0 ![]() |
3 ![]() |
Toyota![]() |
White![]() |
4.0 ![]() |
154365.0 ![]() |
4 ![]() |
Nissan![]() |
Blue![]() |
3.0 ![]() |
181577.0 ![]() |
So what happened here?那么这里发生了什么?
Why are the Doors and Odometer (KM) columns flipped?为什么门和里程表 (KM) 列会翻转?
Where did the column labels go?列标签去哪儿了?
How can I bring them back?我怎样才能把它们带回来?
because it generates a whole new filled_x
array, which the type is numpy.ndarray
, either you assign them back to the original dataframe
or to a new dataframe
equipt with old dataframe
's columns and index.因为它会生成一个全新的
filled_x
数组,其类型为numpy.ndarray
,您可以将它们分配回原始dataframe
或分配给具有旧dataframe
列和索引的新dataframe
帧。
df
###
Make Colour Odometer (KM) Doors
0 Honda White 35431.0 4.0
1 BMW Blue 192714.0 5.0
2 Honda NaN 84714.0 4.0
3 Toyota White 154365.0 4.0
4 Nissan Blue 181577.0 NaN
I remodify your df
with some NaN
, so we can check if the imputer works or not.我用一些
NaN
重新修改了你的df
,所以我们可以检查 imputer 是否工作。
Why are the Doors and Odometer (KM) columns flipped?
为什么门和里程表 (KM) 列会翻转?
Your imputer
's orders differ from your original dataframe
's column order.您
imputer
的订单与原始dataframe
的列顺序不同。
cat_imputer=SimpleImputer(strategy='constant',fill_value='missing')
door_imputer=SimpleImputer(strategy='constant',fill_value=4)
num_imputer=SimpleImputer(strategy="mean")
cat_features=["Make","Colour"]
door_features=["Doors"]
num_features=["Odometer (KM)"]
imputer = ColumnTransformer([
("cat_imputer",cat_imputer,cat_features),
("num_imputer",num_imputer,num_features), ◀━┓
("door_imputer",door_imputer,door_features) ◀━┛
])
filled_x=imputer.fit_transform(df)
filled_x
###
[['Honda' 'White' 35431.0 4.0]
['BMW' 'Blue' 192714.0 5.0]
['Honda' 'missing' 84714.0 4.0]
['Toyota' 'White' 154365.0 4.0]
['Nissan' 'Blue' 181577.0 4.0]]
During the processing, sklearn
are dealing with array
/ ndarray
在处理过程中,
sklearn
正在处理array
/ ndarray
type(filled_x)
###
numpy.ndarray
you use pd.DataFrame(filled_x)
that made the ndarray
-format result from imputer.fit_transform()
putting into a brand new dataframe
without specifying columns and index information.您使用
pd.DataFrame(filled_x)
将 imputer.fit_transform imputer.fit_transform()
) 的ndarray
格式结果放入一个全新的dataframe
框,而无需指定列和索引信息。
You may assign them like:您可以像这样分配它们:
new_frame = pd.DataFrame(filled_x, index=df.index, columns=df.columns)
new_frame
###
And this is a way(though it's not recommended) to assign the result to overwrite the old df
这是一种分配结果以覆盖旧
df
的方法(尽管不推荐)
df.loc[:,:] = filled_x
Practically speaking, we might reserve df
so you can try different imputation methods.实际上,我们可能会保留
df
以便您可以尝试不同的插补方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.