TypeError: ufunc 'isnan' 不支持輸入類型，對 NaN 值使用 Imputer

Question

我是 python 和 pandas 的新手。 我正在嘗試預處理一個由數字和分類特征組成的大數據框，並且在某些列中有 NaN 值。 首先，我嘗試獲取特征矩陣，然后使用 Imputer 計算 Nan 值的平均值或中值。

這是數據框

    MSSubClass MSZoning  LotFrontage  LotArea Street LotShape LandContour  \
0             60       RL         65.0     8450   Pave      Reg         Lvl   
1             20       RL         80.0     9600   Pave      Reg         Lvl   
2             60       RL         68.0    11250   Pave      IR1         Lvl   
3             70       RL         60.0     9550   Pave      IR1         Lvl   
4             60       RL         84.0    14260   Pave      IR1         Lvl   
5             50       RL         85.0    14115   Pave      IR1         Lvl   
6             20       RL         75.0    10084   Pave      Reg         Lvl   
7             60       RL          NaN    10382   Pave      IR1         Lvl   
8             50       RM         51.0     6120   Pave      Reg         Lvl   
9            190       RL         50.0     7420   Pave      Reg         Lvl   
10            20       RL         70.0    11200   Pave      Reg         Lvl   
11            60       RL         85.0    11924   Pave      IR1         Lvl

代碼：只是將 LotFrontage（索引號 = 2）中的 Nan 值更改為列的平均值

imputer = Imputer(missing_values='Nan',strategy="mean",axis=0)
features = reduced_data.iloc[:,:-1].values
imputer.fit(features[:,2])

當我運行它時，出現錯誤：

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

第一：我的做法是否正確？ 第二：如何處理Error？

謝謝

Answer 1

注意Nan和NaN的區別（注意最后的大寫N）你用過Nan

  imputer = Imputer(missing_values='NaN',strategy="mean",axis=0)

將“Nan”替換為“NaN”，您將不會收到此錯誤

Answer 2

試試這個，這是一個工作代碼的例子

from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = np.nan, strategy = 'mean', axis =0)
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])

Answer 3

我猜是由於字符串“Nan”，您的 LotFrontage 列數據存儲為對象數據類型。使用此查找。它很可能會給出對象/字符串。

print(reduced_data.LotFrontage.values.dtype)

Imputer 僅適用於浮點數。

第一種方法：

您可以執行以下操作：1) 將列類型轉換為 Float 2) 找出列 LotFrontage 的平均值 3) 使用 pandas dataframe 函數 fillna 填充 Dataframe 中的 NAN。

reduced_data.LotFrontage = pd.to_numeric(reduced_data.LotFrontage, errors='coerce')
m = reduced_data.LotFrontage.mean(skipna=True)
reduced_data.fillna(m)

上面的代碼將在存在 NAN 的任何地方填充 Dataframe。

第二種方法：

reduced_data.LotFrontage = pd.to_numeric(reduced_data.LotFrontage, errors='coerce')
imputer = Imputer()
features = reduced_data.iloc[:,:-1].values
imputer.fit(features[:,2])

Answer 4

在 missing_value 參數中使用 'NaN' 而不是 'Nan'： imputer=Imputer(missing_values='NaN',strategy='mean',axis=0)

Answer 5

這應該工作

imputer = Imputer(missing_values='NaN', strategy='mean', axis=0)
imputer = imputer.fit(df.iloc[:, 2:3])

TypeError: ufunc 'isnan' 不支持輸入類型，對 NaN 值使用 Imputer

問題描述

5 個解決方案

解決方案1
1 2018-06-04 11:19:52

解決方案2
1 2018-12-16 08:01:34

解決方案3
0 2018-02-27 14:11:03

解決方案4
0 2018-11-19 22:04:14

解決方案5
0 2019-09-27 15:55:27

TypeError: ufunc 'isnan' 不支持輸入類型，對 NaN 值使用 Imputer

問題描述

5 個解決方案

解決方案1 1 2018-06-04 11:19:52

解決方案2 1 2018-12-16 08:01:34

解決方案3 0 2018-02-27 14:11:03

解決方案4 0 2018-11-19 22:04:14

解決方案5 0 2019-09-27 15:55:27

解決方案1
1 2018-06-04 11:19:52

解決方案2
1 2018-12-16 08:01:34

解決方案3
0 2018-02-27 14:11:03

解決方案4
0 2018-11-19 22:04:14

解決方案5
0 2019-09-27 15:55:27