简体   繁体   中英

TypeError: ufunc 'isnan' not supported for the input types, Using Imputer for NaN values

I'm a newbie in python and pandas. I'm trying to preprocess a big dataframe which consists of both numerical and categorical features and in some columns there are NaN values. first I try to get the feature matrix and then use Imputer to put the mean or median value for Nan values.

this is the dataframe

    MSSubClass MSZoning  LotFrontage  LotArea Street LotShape LandContour  \
0             60       RL         65.0     8450   Pave      Reg         Lvl   
1             20       RL         80.0     9600   Pave      Reg         Lvl   
2             60       RL         68.0    11250   Pave      IR1         Lvl   
3             70       RL         60.0     9550   Pave      IR1         Lvl   
4             60       RL         84.0    14260   Pave      IR1         Lvl   
5             50       RL         85.0    14115   Pave      IR1         Lvl   
6             20       RL         75.0    10084   Pave      Reg         Lvl   
7             60       RL          NaN    10382   Pave      IR1         Lvl   
8             50       RM         51.0     6120   Pave      Reg         Lvl   
9            190       RL         50.0     7420   Pave      Reg         Lvl   
10            20       RL         70.0    11200   Pave      Reg         Lvl   
11            60       RL         85.0    11924   Pave      IR1         Lvl

code: just to change the Nan values in LotFrontage (index number = 2) to mean value of the column

imputer = Imputer(missing_values='Nan',strategy="mean",axis=0)
features = reduced_data.iloc[:,:-1].values
imputer.fit(features[:,2])

when I run this, an error occurs which says:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''    

first: Is my approach correct? second: How to handle the Error?

thanks

Note the difference between Nan and NaN (note the capital N at the end) you have used Nan

  imputer = Imputer(missing_values='NaN',strategy="mean",axis=0)

Replace 'Nan' with 'NaN' and you won't get this error

Try this it is an example of working code

from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = np.nan, strategy = 'mean', axis =0)
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])

I guess that due to string 'Nan',your LotFrontage columns data is stored as object data type.Find out using this.It will give object/string most probably.

print(reduced_data.LotFrontage.values.dtype)

Imputer only works on Floats.

1st Approach:

You can do below: 1) Convert column type to Float 2) findout mean of column LotFrontage 3) Use pandas dataframe function fillna to fill NANs in Dataframe.

reduced_data.LotFrontage = pd.to_numeric(reduced_data.LotFrontage, errors='coerce')
m = reduced_data.LotFrontage.mean(skipna=True)
reduced_data.fillna(m)

Above code will fillna in Dataframe wherever NANs are present.

2nd Approach:

reduced_data.LotFrontage = pd.to_numeric(reduced_data.LotFrontage, errors='coerce')
imputer = Imputer()
features = reduced_data.iloc[:,:-1].values
imputer.fit(features[:,2])

In missing_value parameter use 'NaN' instead of 'Nan': imputer=Imputer(missing_values='NaN',strategy='mean',axis=0)

This should work

imputer = Imputer(missing_values='NaN', strategy='mean', axis=0)
imputer = imputer.fit(df.iloc[:, 2:3])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM