[英]Sklearn ValueError: Input contains NaN, infinity or a value too large for dtype('float32')
I fit the following pipeline classifier:我适合以下管道分类器:
Pipeline(memory=None,steps=[('scaler', MinMaxScaler(copy=True, feature_range=(0, 1))),
('kbest', SelectKBest(k=1218,score_func=<function mutual_info_classif at 0x7fec1e4991f0>)),
('classifier',RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,
class_weight='balanced_subsample',
criterion='gini', max_depth=15,
max_features='log2',
max_leaf_nodes=5, max_samples=0.6,
min_impurity_decrease=0.0,
min_impurity_split=None,
min_samples_leaf=2,
min_samples_split=15,
min_weight_fraction_leaf=0.0,
n_estimators=50, n_jobs=None,
oob_score=True, random_state=42,
verbose=0, warm_start=False))],verbose=False)
It fits fine, but when I use predict on my test data I get a ValueError:它很合适,但是当我对我的测试数据使用预测时,我得到一个 ValueError:
*** ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). *** ValueError:输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值。
I checked for infinite and NaNs.我检查了无限和NaN。 The function raising the error is _assert_all_finite located in sklearn.utils.validation.py.引发错误的 function 是位于 sklearn.utils.validation.py 中的 _assert_all_finite。 I imported the function directly and ran it on the X_test array and got no errors:我直接导入了 function 并在 X_test 数组上运行它,没有错误:
from sklearn.utils import validation
validation._assert_all_finite(X_test)
How can I get an error with the exact same data when I run the predict method on the classifier?当我在分类器上运行预测方法时,如何得到完全相同数据的错误? It clearly doesn't have any NaNs or Infs or it would raise an error when I directly import the function.它显然没有任何 NaN 或 Infs,否则当我直接导入 function 时会引发错误。 Somewhere along the predict method, it creates those values, but I don't know when, where and why... Any help would be much appreciated!在 predict 方法的某个地方,它会创建这些值,但我不知道何时、何地以及为什么......任何帮助将不胜感激!
Here's the full error message:这是完整的错误消息:
Traceback (most recent call last):
File "testz.py", line 159, in <module>
testing(dx_type, population, dx_option, feat_sel_metric, data_types, ratio_name, model_selection_metric, repo_path)
File "testz.py", line 107, in testing
y_test_pred=top_clf.predict(X_test)
File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/utils/metaestimators.py", line 116, in <lambda>
out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/pipeline.py", line 420, in predict
return self.steps[-1][-1].predict(Xt, **predict_params)
File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 612, in predict
proba = self.predict_proba(X)
File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 656, in predict_proba
X = self._validate_X_predict(X)
File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 412, in _validate_X_predict
return self.estimators_[0]._validate_X_predict(X, check_input=True)
File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/tree/_classes.py", line 380, in _validate_X_predict
X = check_array(X, dtype=DTYPE, accept_sparse="csr")
File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 577, in check_array
_assert_all_finite(array,
File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 57, in _assert_all_finite
raise ValueError(
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
It is impossible to know how you want to handle this;不可能知道您想如何处理它; the problem is laid out plainly in the error ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
问题在错误ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
What we can do is make some assumptions about how you want this handled.我们可以做的是对您希望如何处理这个问题做出一些假设。 We don't see how or where you create X_test
, but, I assume it is from train_test_split , and that it is a pandas dataframe given the traceback.我们看不到您创建X_test
的方式或位置,但是,我假设它来自train_test_split ,并且它是pandas dataframe给定回溯。
So, you could do the following:因此,您可以执行以下操作:
# Assumes import pandas as pd, numpy as np
# First, replace all infinity values with nan
X_train.replace([np.inf, -np.inf], np.nan), inplace=True)
# Then, replace nan values with whatever you like. This example uses 0
X_train.fillna(0, inplace=True)
# You'll probably want to repeat the same for X_Test
# First, replace all infinity values with nan
X_test.replace([np.inf, -np.inf], np.nan), inplace=True)
# Then, replace nan values with whatever you like. This example uses 0
X_test.fillna(0, inplace=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.