簡體   English   中英

Python Sklearn - RandomForest和Missing值

[英]Python Sklearn - RandomForest and Missing values

我正在嘗試在包含缺失值的數據集上執行RandomForest。

我的數據集如下:

train_data = [['1' 'NaN' 'NaN' '0.0127034' '0.0435092']
 ['1' 'NaN' 'NaN' '0.0113187' '0.228205']
 ['1' '0.648' '0.248' '0.0142176' '0.202707']
 ..., 
 ['1' '0.357' '0.470' '0.0328121' '0.255039']
 ['1' 'NaN' 'NaN' '0.00311825' '0.0381745']
 ['1' 'NaN' 'NaN' '0.0332604' '0.2857']]

為了估算“NaN”值,我正在使用:

from sklearn.preprocessing import Imputer

imp=Imputer(missing_values='NaN',strategy='mean',axis=0)
imp.fit(train_data[0::,1::])
new_train_data=imp.transform(train_data)

但是我收到以下錯誤:

Traceback (most recent call last):
  File "./RandomForest.py", line 72, in <module>
    new_train_data=imp.transform(train_data)
  File "/home/aurore/.local/lib/python2.7/site-packages/sklearn/preprocessing    /imputation.py", line 388, in transform
    values = np.repeat(valid_statistics, n_missing)
  File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 343, in repeat
    return repeat(repeats, axis)
ValueError: a.shape[axis] != len(repeats)

我做的:

new_train_data = imp.fit_transform(train_data)

然后我收到這個錯誤:

Traceback (most recent call last):
  File "./RandomForest.py", line 82, in <module>
    forest = forest.fit(train_data[0::,1::],train_data[0::,0])
  File "/home/aurore/.local/lib/python2.7/site-packages/sklearn/ensemble/forest.py", line 224, in fit
    X, = check_arrays(X, dtype=DTYPE, sparse_format="dense")
  File "/home/aurore/.local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 283, in check_arrays
    _assert_all_finite(array)
  File "/home/aurore/.local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 43, in _assert_all_finite
    " or a value too large for %r." % X.dtype)
 ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

包裹有問題嗎? 有人可以幫幫我嗎? 這是什么意思?

您在列1::上訓練imputer,但之后您嘗試將其應用於所有列。 這不起作用。

new_train_data = imp.fit_transform(train_data)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM