简体   繁体   English

Scikit学习RandomForestClassifier错误

[英]Scikit-learn RandomForestClassifier error

I am using Python 3.5 and I have NumPy, SciPy, and matplotlib installed and imported. 我正在使用Python 3.5,并且已安装并导入了NumPy,SciPy和matplotlib。

When I try: 当我尝试:

# Import the random forest package
from sklearn.ensemble import RandomForestClassifier

# Create the random forest object which will include all the parameters
# for the fit
forest = RandomForestClassifier(n_estimators = 1)

# Fit the training data to the Survived labels and create the decision trees
forest = forest.fit(train_data[0::,1::],train_data[0::,0])

# Take the same decision trees and run it on the test data
output = forest.predict(test_data)

(test_data and train_data are both float arrays) I get the following error: (test_data和train_data都是浮点数组)我得到以下错误:

C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\utils\fixes.py:64: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  if 'order' in inspect.getargspec(np.copy)[0]:
C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\base.py:175: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  args, varargs, kw, default = inspect.getargspec(init)
C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\base.py:175: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  args, varargs, kw, default = inspect.getargspec(init)
C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\base.py:175: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  args, varargs, kw, default = inspect.getargspec(init)
C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\base.py:175: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  args, varargs, kw, default = inspect.getargspec(init)
Traceback (most recent call last):
  File "C:/Users/Uri/PycharmProjects/titanic1/fdsg.py", line 54, in <module>
    output = forest.predict(test_data)
  File "C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\ensemble\forest.py", line 461, in predict
    X = check_array(X, ensure_2d=False, accept_sparse="csr")
  File "C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\utils\validation.py", line 352, in check_array
    _assert_all_finite(array)
  File "C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\utils\validation.py", line 52, in _assert_all_finite
    " or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
Process finished with exit code 1
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import Imputer
import numpy as np

X = np.random.randint(0, (2**31)-1, (500, 4)).astype(object)
y = np.random.randint(0, 2, 500)
clf = RandomForestClassifier()
print(X.max())
clf.fit(X, y) # OK
print("First fit OK")

# 1 - First case your data has null values
X[0,0] = np.nan # replaces of of the cells by a null value
#clf.fit(X, y) # gives you the same error

# to solve NAN values you can use the Imputer class:
imp = Imputer(strategy='median')
X_ok = imp.fit_transform(X)
clf.fit(X_ok, y)

# 2 - Second case your data has huge integers
X[0,0] = 2**128 # the same happens if you have a huge integer
#clf.fit(X, y) # gives you the same error
# to solve this you can clip your values to some cap
X_ok = X.clip(-2**63, 2**63) # I used 2**63 for example, but you should realize what makes sense to your application
clf.fit(X_ok, y)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM