繁体   English   中英

Sklearn ValueError:输入包含 NaN、无穷大或对于 dtype('float32')来说太大的值

[英]Sklearn ValueError: Input contains NaN, infinity or a value too large for dtype('float32')

我适合以下管道分类器:

Pipeline(memory=None,steps=[('scaler', MinMaxScaler(copy=True, feature_range=(0, 1))),
                   ('kbest', SelectKBest(k=1218,score_func=<function mutual_info_classif at 0x7fec1e4991f0>)),
                   ('classifier',RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,
                                           class_weight='balanced_subsample',
                                           criterion='gini', max_depth=15,
                                           max_features='log2',
                                           max_leaf_nodes=5, max_samples=0.6,
                                           min_impurity_decrease=0.0,
                                           min_impurity_split=None,
                                           min_samples_leaf=2,
                                           min_samples_split=15,
                                           min_weight_fraction_leaf=0.0,
                                           n_estimators=50, n_jobs=None,
                                           oob_score=True, random_state=42,
                                           verbose=0, warm_start=False))],verbose=False)

它很合适,但是当我对我的测试数据使用预测时,我得到一个 ValueError:

*** ValueError:输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值。

我检查了无限和NaN。 引发错误的 function 是位于 sklearn.utils.validation.py 中的 _assert_all_finite。 我直接导入了 function 并在 X_test 数组上运行它,没有错误:

from sklearn.utils import validation
validation._assert_all_finite(X_test)

当我在分类器上运行预测方法时,如何得到完全相同数据的错误? 它显然没有任何 NaN 或 Infs,否则当我直接导入 function 时会引发错误。 在 predict 方法的某个地方,它会创建这些值,但我不知道何时、何地以及为什么......任何帮助将不胜感激!

这是完整的错误消息:

Traceback (most recent call last):
  File "testz.py", line 159, in <module>
    testing(dx_type, population, dx_option, feat_sel_metric, data_types, ratio_name, model_selection_metric, repo_path)
  File "testz.py", line 107, in testing
    y_test_pred=top_clf.predict(X_test)
  File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/utils/metaestimators.py", line 116, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/pipeline.py", line 420, in predict
    return self.steps[-1][-1].predict(Xt, **predict_params)
  File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 612, in predict
    proba = self.predict_proba(X)
  File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 656, in predict_proba
    X = self._validate_X_predict(X)
  File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 412, in _validate_X_predict
    return self.estimators_[0]._validate_X_predict(X, check_input=True)
  File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/tree/_classes.py", line 380, in _validate_X_predict
    X = check_array(X, dtype=DTYPE, accept_sparse="csr")
  File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 577, in check_array
    _assert_all_finite(array,
  File "/home/user/anaconda3/envs/myenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 57, in _assert_all_finite
    raise ValueError(
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

不可能知道您想如何处理它; 问题在错误ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

我们可以做的是对您希望如何处理这个问题做出一些假设。 我们看不到您创建X_test的方式或位置,但是,我假设它来自train_test_split ,并且它是pandas dataframe给定回溯。

因此,您可以执行以下操作:

# Assumes import pandas as pd, numpy as np

# First, replace all infinity values with nan
X_train.replace([np.inf, -np.inf], np.nan), inplace=True)

# Then, replace nan values with whatever you like. This example uses 0
X_train.fillna(0, inplace=True)

# You'll probably want to repeat the same for X_Test

# First, replace all infinity values with nan
X_test.replace([np.inf, -np.inf], np.nan), inplace=True)

# Then, replace nan values with whatever you like. This example uses 0
X_test.fillna(0, inplace=True)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM