简体   繁体   English

Sklearn ValueError:输入包含 NaN、无穷大或对于 dtype('float32')来说太大的值

[英]Sklearn ValueError: Input contains NaN, infinity or a value too large for dtype('float32')

I fit the following pipeline classifier:我适合以下管道分类器:

Pipeline(memory=None,steps=[('scaler', MinMaxScaler(copy=True, feature_range=(0, 1))),
                   ('kbest', SelectKBest(k=1218,score_func=<function mutual_info_classif at 0x7fec1e4991f0>)),
                   ('classifier',RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,
                                           class_weight='balanced_subsample',
                                           criterion='gini', max_depth=15,
                                           max_features='log2',
                                           max_leaf_nodes=5, max_samples=0.6,
                                           min_impurity_decrease=0.0,
                                           min_impurity_split=None,
                                           min_samples_leaf=2,
                                           min_samples_split=15,
                                           min_weight_fraction_leaf=0.0,
                                           n_estimators=50, n_jobs=None,
                                           oob_score=True, random_state=42,
                                           verbose=0, warm_start=False))],verbose=False)

It fits fine, but when I use predict on my test data I get a ValueError:它很合适,但是当我对我的测试数据使用预测时,我得到一个 ValueError:

*** ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). *** ValueError:输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值。

I checked for infinite and NaNs.我检查了无限和NaN。 The function raising the error is _assert_all_finite located in sklearn.utils.validation.py.引发错误的 function 是位于 sklearn.utils.validation.py 中的 _assert_all_finite。 I imported the function directly and ran it on the X_test array and got no errors:我直接导入了 function 并在 X_test 数组上运行它,没有错误:

from sklearn.utils import validation
validation._assert_all_finite(X_test)

How can I get an error with the exact same data when I run the predict method on the classifier?当我在分类器上运行预测方法时,如何得到完全相同数据的错误? It clearly doesn't have any NaNs or Infs or it would raise an error when I directly import the function.它显然没有任何 NaN 或 Infs,否则当我直接导入 function 时会引发错误。 Somewhere along the predict method, it creates those values, but I don't know when, where and why... Any help would be much appreciated!在 predict 方法的某个地方,它会创建这些值,但我不知道何时、何地以及为什么......任何帮助将不胜感激!

Here's the full error message:这是完整的错误消息:

Traceback (most recent call last):
  File "testz.py", line 159, in <module>
    testing(dx_type, population, dx_option, feat_sel_metric, data_types, ratio_name, model_selection_metric, repo_path)
  File "testz.py", line 107, in testing
    y_test_pred=top_clf.predict(X_test)
  File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/utils/metaestimators.py", line 116, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/pipeline.py", line 420, in predict
    return self.steps[-1][-1].predict(Xt, **predict_params)
  File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 612, in predict
    proba = self.predict_proba(X)
  File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 656, in predict_proba
    X = self._validate_X_predict(X)
  File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 412, in _validate_X_predict
    return self.estimators_[0]._validate_X_predict(X, check_input=True)
  File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/tree/_classes.py", line 380, in _validate_X_predict
    X = check_array(X, dtype=DTYPE, accept_sparse="csr")
  File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 577, in check_array
    _assert_all_finite(array,
  File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 57, in _assert_all_finite
    raise ValueError(
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

It is impossible to know how you want to handle this;不可能知道您想如何处理它; the problem is laid out plainly in the error ValueError: Input contains NaN, infinity or a value too large for dtype('float32').问题在错误ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

What we can do is make some assumptions about how you want this handled.我们可以做的是对您希望如何处理这个问题做出一些假设。 We don't see how or where you create X_test , but, I assume it is from train_test_split , and that it is a pandas dataframe given the traceback.我们看不到您创建X_test的方式或位置,但是,我假设它来自train_test_split ,并且它是pandas dataframe给定回溯。

So, you could do the following:因此,您可以执行以下操作:

# Assumes import pandas as pd, numpy as np

# First, replace all infinity values with nan
X_train.replace([np.inf, -np.inf], np.nan), inplace=True)

# Then, replace nan values with whatever you like. This example uses 0
X_train.fillna(0, inplace=True)

# You'll probably want to repeat the same for X_Test

# First, replace all infinity values with nan
X_test.replace([np.inf, -np.inf], np.nan), inplace=True)

# Then, replace nan values with whatever you like. This example uses 0
X_test.fillna(0, inplace=True)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 sklearn错误ValueError:输入包含NaN,无穷大或对于dtype(&#39;float32&#39;)而言太大的值 - sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float32') 随机森林分类器 ValueError:输入包含 NaN、无穷大或对于 dtype(&#39;float32&#39;) 来说太大的值 - Random Forest Classifier ValueError: Input contains NaN, infinity or a value too large for dtype('float32') 如何解决:ValueError: Input contains NaN, infinity or a value too large for dtype('float32')? - How to resolve: ValueError: Input contains NaN, infinity or a value too large for dtype('float32')? ValueError:输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值。 随机森林运行 - ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). randomforest run ValueError:输入包含NaN,无穷大或对于dtype(&#39;float32&#39;)而言太大的值。 为什么? - ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). Why? Scikit-learn - ValueError:输入包含 NaN、无穷大或对于 dtype(&#39;float32&#39;) 和随机森林来说太大的值 - Scikit-learn - ValueError: Input contains NaN, infinity or a value too large for dtype('float32') with Random Forest Python 输入包含 NaN、无穷大或对于 dtype float32 来说太大的值 - Python input contains NaN, infinity or a value too large for dtype float32 输入包含 NaN、无穷大或对于 dtype(&#39;float32&#39;) 来说太大的值 - Input contains NaN, infinity or a value too large for dtype('float32') 输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值。 毕索克 - Input contains NaN, infinity or a value too large for dtype('float32'). Pythorch RandomForestRegressor:输入包含 NaN、无穷大或对于 dtype(&#39;float32&#39;) 在 kaggle learn 上太大的值 - RandomForestRegressor: Input contains NaN, infinity or a value too large for dtype('float32') on kaggle learn
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM