scikit-learn：擬合模型錯誤-輸入包含NaN，無窮大或對於float64而言太大的值

Question

我的問題似乎與以前的帖子（ post-1 ， post-2和post-3 ）相同。 我確實遵循了他們的解決方案，但仍然遇到相同的錯誤。 因此，我將其張貼在這里以尋求建議。

我正在使用sklearn的基本功能。 原始數據包含缺失值，因此我使用Imputer填充中位數。 然后，我使用LabelEncoder從數字特征轉換名義特征。 之后，我使用StandardScaler標准化數據集。

問題在LinearRegression階段。 我收到“ ValueError：輸入包含NaN，無窮大或對於dtype（'float64'）而言太大的值。”但我確實檢查了數據集，沒有NaN，無窮大或value_too_large ...

真的不知道為什么會出現此錯誤。 如果您有任何線索，請隨時發表評論。 謝謝！

我使用的代碼是：

import csv
import numpy as np
from sklearn import preprocessing 
from sklearn import linear_model
from sklearn.preprocessing import Imputer
from sklearn.preprocessing import StandardScaler

out_file = 'raw.dat'      
dataset = np.loadtxt(out_file, delimiter=',')
data = dataset[:, 0:-1]   # select columns 0 through -1
target = dataset[:, -1]   # select the last column

# to handle missing values
imp = Imputer(missing_values='NaN', strategy='median', axis=0)
imp.fit(data)
data_imp = imp.transform(data)

# label Encoder: converting nominal features
le = preprocessing.LabelEncoder()
le.fit(data_imp[:, 2])
print le.classes_
le.transform(data_imp[:, 2])

le.fit(data_imp[:, 3])
print le.classes_
le.transform(data_imp[:, 3])

print '# of data: ', len(target)

scaler = preprocessing.StandardScaler().fit_transform(data_imp)
scaler = scaler.astype(np.float64, copy=False)

np.savetxt("newdata2.csv", scaler, delimiter=",")
ols = linear_model.LinearRegression()
for x in xrange(2, len(scaler)):
    print x
    scaler = scaler[:x, 1:]
    print scaler
    print np.isnan(scaler.any()) # False
    print np.any(np.isnan(scaler)) # False

    print np.isfinite(scaler.all()) # True
    print np.all(np.isfinite(scaler)) # True

    ols.fit(scaler, target)
    print ols

錯誤信息如下所示。

Traceback (most recent call last):
  File ".\data_export.py", line 123, in prep
   ols.fit(scaler, target)
  File "C:\Python27\lib\site-packages\sklearn\linear_model\base.py", 
    line 427, in fit y_numeric=True, multi_output=True)
  File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", 
    line 513, in check_X_y dtype=None)
  File "C:\Python27\lib\site-packages\sklearn\utils\validation.py",   
    line 398, in check_array _assert_all_finite(array)
  File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", 
    line 54, in _assert_all_finite" or a value too large for %r." % X.dtype)
  ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

原始數據（raw.dat）部分顯示如下：

1, 2.0, 14002, 1, 1965, 1, 1, 2, NaN, 771, 648.0, 4800.0
2, 2.8, 14002, 2, 1924, 3, 1, 4, NaN, 1400, 714.0, 999.0
3, 2.1, 14002, 1, 1965, 1, 1, 2, NaN, 725, 675.0, 4000.0
4, 1.6, 14002, 2, 1914, 2, 1, 3, 1, 1530, 620.0, 9950.0
5, 8.9, 14010, 1, 1973, 2, 1, 3, NaN, 1048, 705.0, 9000.0
6, 7.3, 14010, 1, 1982, 1, 1, 2, 1, 880, 656.0, 5000.0
......

在修復了缺失的值並標准化了數字之后，來自newdata2.csv的數據如下所示：

-1.70   -2.23   -1.64   -1.15   -0.40   -1.80   -0.86   -1.78   0.05    -1.35   0.37
-1.70   -2.14   -1.64   0.28    -2.54   0.36    -0.86   -0.56   0.05    0.21    0.75
-1.70   -2.22   -1.64   -1.15   -0.40   -1.80   -0.86   -1.78   0.05    -1.46   0.52
-1.70   -2.28   -1.64   0.28    -3.06   -0.72   -0.86   -1.17   0.05    0.53    0.20
-1.70   -1.43   -1.62   -1.15   0.01    -0.72   -0.86   -1.17   0.05    -0.66   0.69
....

Answer 1

您的raw.dat文件中有NaN值。 刪除該列或將其替換為0。

scikit-learn：擬合模型錯誤-輸入包含NaN，無窮大或對於float64而言太大的值

問題描述

1 個解決方案

解決方案1
0 2016-08-19 23:46:53

scikit-learn：擬合模型錯誤-輸入包含NaN，無窮大或對於float64而言太大的值

問題描述

1 個解決方案

解決方案1 0 2016-08-19 23:46:53

解決方案1
0 2016-08-19 23:46:53