如何解決：ValueError: Input contains NaN, infinity or a value too large for dtype('float32')?

Question

from sklearn.ensemble import RandomForestClassifier
import numpy as np
from sklearn.model_selection import cross_validate
from sklearn.metrics import fbeta_score, make_scorer
import keras.backend as K
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.base import BaseEstimator, ClassifierMixin
import pandas as pd

class CustomThreshold(BaseEstimator, ClassifierMixin):
    """ Custom threshold wrapper for binary classification"""
    def __init__(self, base, threshold=0.5):
        self.base = base
        self.threshold = threshold
    def fit(self, *args, **kwargs):
        self.base.fit(*args, **kwargs)
        return self
    def predict(self, X):
        return (self.base.predict_proba(X)[:, 1] > self.threshold).astype(int)

dataset_clinical = np.genfromtxt("/content/drive/MyDrive/Colab Notebooks/BreastCancer-master/Data/stacked_metadata.csv",delimiter=",")
X = dataset_clinical[:,0:450]
Y = dataset_clinical[:,450]
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=1)
rf = RandomForestClassifier(n_estimators=10).fit(X,Y) 
clf = [CustomThreshold(rf, threshold) for threshold in [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]]

for model in clf:
    print(confusion_matrix(y_test, model.predict(X_test)))
for model in clf:
    print(confusion_matrix(Y, model.predict(X)))

*回溯顯示以下內容：回溯（最近一次調用）：

文件“RF.py”，第 33 行，在 rf = RandomForestClassifier(n_estimators=10).fit(X,Y)

文件“/usr/local/lib/python3.7/dist-packages/sklearn/ensemble/_forest.py”，第 328 行，適合 X，y，multi_output=True，accept_sparse="csc"，dtype=DTYPE

文件“/usr/local/lib/python3.7/dist-packages/sklearn/base.py”，第 576 行，在 _validate_data X, y = check_X_y(X, y, **check_params)

文件“/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py”，第 968 行，在 check_X_y estimator=estimator 中，

文件“/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py”，第 792 行，in check_array_assert_all_finite(array, allow_nan=force_all_finite == “allow-nan”)

文件“/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py”，第 116 行，在 _assert_all_finite type_err, msg_dtype if msg_dtype is not None else X.dtype

ValueError：輸入包含 NaN、無窮大或對於 dtype('float32') 來說太大的值。 *

Answer 1

這可能發生在 scikit 內部，這取決於你在做什么。 我建議閱讀有關您正在使用的功能的文檔。 您可能正在使用一個取決於例如您的矩陣是正定的並且不滿足該標准的矩陣。

嘗試通過以下方式刪除您的意外值：

np.any(np.isnan(your_matrix))
np.all(np.isfinite(your_matrix))

Answer 2

乍一看，我會說檢查你的數據集是否有缺失值、異常值等。

任何 ML model 的很大一部分是數據探索和預處理。 我為初學者找到了一個指南。 Pandas: https://towardsdatascience.com/data-visualization-exploration-using-pandas-only-beginner-a0a52eb723d5

如何解決：ValueError: Input contains NaN, infinity or a value too large for dtype('float32')?

問題描述

1 個解決方案

解決方案1
0 2021-12-18 14:23:13

解決方案2
0 2021-12-18 15:48:00

如何解決：ValueError: Input contains NaN, infinity or a value too large for dtype('float32')?

問題描述

1 個解決方案

解決方案1 0 2021-12-18 14:23:13

解決方案2 0 2021-12-18 15:48:00

解決方案1
0 2021-12-18 14:23:13

解決方案2
0 2021-12-18 15:48:00