随机森林分类器 ValueError：输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值

Question

我正在尝试将RandomForest方法应用于数据集，但出现此错误：

ValueError: Input contains NaN, infinity or a value too large for dtype ('float32')

有人可以告诉我我可以在函数中修改哪些代码才能工作：

def ranks_RF(x_train, y_train, features_train, RESULT_PATH='Results'):
    """Get ranks from Random Forest"""

    print("\nMétodo_Random_Forest")

    random_forest = RandomForestRegressor(n_estimators=10)
    np.nan_to_num(x_train)
    np.nan_to_num(y_train)
    random_forest.fit(x_train, y_train)

    # Get rank by doing two times a sort.
    imp_array = np.array(random_forest.feature_importances_)
    imp_order = imp_array.argsort()
    ranks = imp_order.argsort()

    # Plot Random Forest
    imp = pd.Series(random_forest.feature_importances_, index=x_train.columns)
    imp = imp.sort_values()

    imp.plot(kind="barh")
    plt.xlabel("Importance")
    plt.ylabel("Features")
    plt.title("Feature importance using Random Forest")
    # plt.show()
    plt.savefig(RESULT_PATH + '/ranks_RF.png', bbox_inches='tight')

    return ranks

Answer 1

我试图将RandomForest方法应用于数据集，但出现此错误：

ValueError: Input contains NaN, infinity or a value too large for dtype ('float32')

有人可以告诉我我可以在函数中进行哪些修改以使代码正常工作：

def ranks_RF(x_train, y_train, features_train, RESULT_PATH='Results'):
    """Get ranks from Random Forest"""

    print("\nMétodo_Random_Forest")

    random_forest = RandomForestRegressor(n_estimators=10)
    np.nan_to_num(x_train)
    np.nan_to_num(y_train)
    random_forest.fit(x_train, y_train)

    # Get rank by doing two times a sort.
    imp_array = np.array(random_forest.feature_importances_)
    imp_order = imp_array.argsort()
    ranks = imp_order.argsort()

    # Plot Random Forest
    imp = pd.Series(random_forest.feature_importances_, index=x_train.columns)
    imp = imp.sort_values()

    imp.plot(kind="barh")
    plt.xlabel("Importance")
    plt.ylabel("Features")
    plt.title("Feature importance using Random Forest")
    # plt.show()
    plt.savefig(RESULT_PATH + '/ranks_RF.png', bbox_inches='tight')

    return ranks

Answer 2

替换 nan 时您没有覆盖这些值，因此它给了您错误。

我们尝试一个示例数据集：

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor

from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(data= iris['data'],
                     columns= iris['feature_names'] )
df['target'] = iris['target']
# insert some NAs
df = df.mask(np.random.random(df.shape) < .1)

我们有一个像你这样的功能，我删除了绘图部分，因为那完全是另一个问题：

def ranks_RF(x_train, y_train):

    var_names = x_train.columns
    random_forest = RandomForestRegressor(n_estimators=10)
# here you have to reassign back the values
    x_train = np.nan_to_num(x_train)
    y_train = np.nan_to_num(y_train)
    random_forest.fit(x_train, y_train)

    res = pd.DataFrame({
    "features":var_names,
    "importance":random_forest.feature_importances_,
    })
    res = res.sort_values(['importance'],ascending=False)
    res['rank'] = np.arange(len(res))+1

    return res

我们运行它：

ranks_RF(df.iloc[:,0:4],df['target'])

    features    importance  rank
3   petal width (cm)    0.601734    1
2   petal length (cm)   0.191613    2
0   sepal length (cm)   0.132212    3
1   sepal width (cm)    0.074442

Answer 3

这对我有用

np.where(x.values >= np.finfo(np.float32).max)

其中 x 是我的 Pandas Dataframe 如果不是，则将您的 DataFrame 转换为 Float32

随机森林分类器 ValueError：输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值

问题描述

2 个解决方案

解决方案1
0 2020-02-16 19:59:49

解决方案2
0 已采纳 2020-02-27 15:25:09

解决方案3
0 2022-01-01 06:25:21

随机森林分类器 ValueError：输入包含 NaN、无穷大或对于 dtype(&#39;float32&#39;) 来说太大的值

问题描述

2 个解决方案

解决方案1 0 2020-02-16 19:59:49

解决方案2 0 已采纳 2020-02-27 15:25:09

解决方案3 0 2022-01-01 06:25:21

随机森林分类器 ValueError：输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值

解决方案1
0 2020-02-16 19:59:49

解决方案2
0 已采纳 2020-02-27 15:25:09

解决方案3
0 2022-01-01 06:25:21