繁体   English   中英

如何解决:ValueError: Input contains NaN, infinity or a value too large for dtype('float32')?

[英]How to resolve: ValueError: Input contains NaN, infinity or a value too large for dtype('float32')?

from sklearn.ensemble import RandomForestClassifier
import numpy as np
from sklearn.model_selection import cross_validate
from sklearn.metrics import fbeta_score, make_scorer
import keras.backend as K
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.base import BaseEstimator, ClassifierMixin
import pandas as pd

class CustomThreshold(BaseEstimator, ClassifierMixin):
    """ Custom threshold wrapper for binary classification"""
    def __init__(self, base, threshold=0.5):
        self.base = base
        self.threshold = threshold
    def fit(self, *args, **kwargs):
        self.base.fit(*args, **kwargs)
        return self
    def predict(self, X):
        return (self.base.predict_proba(X)[:, 1] > self.threshold).astype(int)

dataset_clinical = np.genfromtxt("/content/drive/MyDrive/Colab Notebooks/BreastCancer-master/Data/stacked_metadata.csv",delimiter=",")
X = dataset_clinical[:,0:450]
Y = dataset_clinical[:,450]
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=1)
rf = RandomForestClassifier(n_estimators=10).fit(X,Y) 
clf = [CustomThreshold(rf, threshold) for threshold in [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]]

for model in clf:
    print(confusion_matrix(y_test, model.predict(X_test)))
for model in clf:
    print(confusion_matrix(Y, model.predict(X)))

*回溯显示以下内容:回溯(最近一次调用):

文件“RF.py”,第 33 行,在 rf = RandomForestClassifier(n_estimators=10).fit(X,Y)

文件“/usr/local/lib/python3.7/dist-packages/sklearn/ensemble/_forest.py”,第 328 行,适合 X,y,multi_output=True,accept_sparse="csc",dtype=DTYPE

文件“/usr/local/lib/python3.7/dist-packages/sklearn/base.py”,第 576 行,在 _validate_data X, y = check_X_y(X, y, **check_params)

文件“/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py”,第 968 行,在 check_X_y estimator=estimator 中,

文件“/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py”,第 792 行,in check_array_assert_all_finite(array, allow_nan=force_all_finite == “allow-nan”)

文件“/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py”,第 116 行,在 _assert_all_finite type_err, msg_dtype if msg_dtype is not None else X.dtype

ValueError:输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值。 *

这可能发生在 scikit 内部,这取决于你在做什么。 我建议阅读有关您正在使用的功能的文档。 您可能正在使用一个取决于例如您的矩阵是正定的并且不满足该标准的矩阵。

尝试通过以下方式删除您的意外值:

np.any(np.isnan(your_matrix))
np.all(np.isfinite(your_matrix))

乍一看,我会说检查你的数据集是否有缺失值、异常值等。

任何 ML model 的很大一部分是数据探索和预处理。 我为初学者找到了一个指南。 Pandas: https://towardsdatascience.com/data-visualization-exploration-using-pandas-only-beginner-a0a52eb723d5

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM