如何解决：ValueError: Input contains NaN, infinity or a value too large for dtype('float32')?

Question

from sklearn.ensemble import RandomForestClassifier
import numpy as np
from sklearn.model_selection import cross_validate
from sklearn.metrics import fbeta_score, make_scorer
import keras.backend as K
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.base import BaseEstimator, ClassifierMixin
import pandas as pd

class CustomThreshold(BaseEstimator, ClassifierMixin):
    """ Custom threshold wrapper for binary classification"""
    def __init__(self, base, threshold=0.5):
        self.base = base
        self.threshold = threshold
    def fit(self, *args, **kwargs):
        self.base.fit(*args, **kwargs)
        return self
    def predict(self, X):
        return (self.base.predict_proba(X)[:, 1] > self.threshold).astype(int)

dataset_clinical = np.genfromtxt("/content/drive/MyDrive/Colab Notebooks/BreastCancer-master/Data/stacked_metadata.csv",delimiter=",")
X = dataset_clinical[:,0:450]
Y = dataset_clinical[:,450]
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=1)
rf = RandomForestClassifier(n_estimators=10).fit(X,Y) 
clf = [CustomThreshold(rf, threshold) for threshold in [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]]

for model in clf:
    print(confusion_matrix(y_test, model.predict(X_test)))
for model in clf:
    print(confusion_matrix(Y, model.predict(X)))

*回溯显示以下内容：回溯（最近一次调用）：

文件“RF.py”，第 33 行，在 rf = RandomForestClassifier(n_estimators=10).fit(X,Y)

文件“/usr/local/lib/python3.7/dist-packages/sklearn/ensemble/_forest.py”，第 328 行，适合 X，y，multi_output=True，accept_sparse="csc"，dtype=DTYPE

文件“/usr/local/lib/python3.7/dist-packages/sklearn/base.py”，第 576 行，在 _validate_data X, y = check_X_y(X, y, **check_params)

文件“/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py”，第 968 行，在 check_X_y estimator=estimator 中，

文件“/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py”，第 792 行，in check_array_assert_all_finite(array, allow_nan=force_all_finite == “allow-nan”)

文件“/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py”，第 116 行，在 _assert_all_finite type_err, msg_dtype if msg_dtype is not None else X.dtype

ValueError：输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值。 *

Answer 1

这可能发生在 scikit 内部，这取决于你在做什么。 我建议阅读有关您正在使用的功能的文档。 您可能正在使用一个取决于例如您的矩阵是正定的并且不满足该标准的矩阵。

尝试通过以下方式删除您的意外值：

np.any(np.isnan(your_matrix))
np.all(np.isfinite(your_matrix))

Answer 2

乍一看，我会说检查你的数据集是否有缺失值、异常值等。

任何 ML model 的很大一部分是数据探索和预处理。 我为初学者找到了一个指南。 Pandas: https://towardsdatascience.com/data-visualization-exploration-using-pandas-only-beginner-a0a52eb723d5

如何解决：ValueError: Input contains NaN, infinity or a value too large for dtype('float32')?

问题描述

1 个解决方案

解决方案1
0 2021-12-18 14:23:13

解决方案2
0 2021-12-18 15:48:00

如何解决：ValueError: Input contains NaN, infinity or a value too large for dtype('float32')?

问题描述

1 个解决方案

解决方案1 0 2021-12-18 14:23:13

解决方案2 0 2021-12-18 15:48:00

解决方案1
0 2021-12-18 14:23:13

解决方案2
0 2021-12-18 15:48:00