[英]Random Forest Classifier ValueError: Input contains NaN, infinity or a value too large for dtype('float32')
I'm trying to apply the RandomForest
method to a dataset and I get this error:我正在尝试将RandomForest
方法应用于数据集,但出现此错误:
ValueError: Input contains NaN, infinity or a value too large for dtype ('float32')
Could someone tell me what I can modify in the function for the code to work:有人可以告诉我我可以在函数中修改哪些代码才能工作:
def ranks_RF(x_train, y_train, features_train, RESULT_PATH='Results'):
"""Get ranks from Random Forest"""
print("\nMétodo_Random_Forest")
random_forest = RandomForestRegressor(n_estimators=10)
np.nan_to_num(x_train)
np.nan_to_num(y_train)
random_forest.fit(x_train, y_train)
# Get rank by doing two times a sort.
imp_array = np.array(random_forest.feature_importances_)
imp_order = imp_array.argsort()
ranks = imp_order.argsort()
# Plot Random Forest
imp = pd.Series(random_forest.feature_importances_, index=x_train.columns)
imp = imp.sort_values()
imp.plot(kind="barh")
plt.xlabel("Importance")
plt.ylabel("Features")
plt.title("Feature importance using Random Forest")
# plt.show()
plt.savefig(RESULT_PATH + '/ranks_RF.png', bbox_inches='tight')
return ranks
I'm trying to apply the RandomForest
method to a dataset and I get this error:我试图将RandomForest
方法应用于数据集,但出现此错误:
ValueError: Input contains NaN, infinity or a value too large for dtype ('float32')
Could someone tell me what I can modify in the function for the code to work:有人可以告诉我我可以在函数中进行哪些修改以使代码正常工作:
def ranks_RF(x_train, y_train, features_train, RESULT_PATH='Results'):
"""Get ranks from Random Forest"""
print("\nMétodo_Random_Forest")
random_forest = RandomForestRegressor(n_estimators=10)
np.nan_to_num(x_train)
np.nan_to_num(y_train)
random_forest.fit(x_train, y_train)
# Get rank by doing two times a sort.
imp_array = np.array(random_forest.feature_importances_)
imp_order = imp_array.argsort()
ranks = imp_order.argsort()
# Plot Random Forest
imp = pd.Series(random_forest.feature_importances_, index=x_train.columns)
imp = imp.sort_values()
imp.plot(kind="barh")
plt.xlabel("Importance")
plt.ylabel("Features")
plt.title("Feature importance using Random Forest")
# plt.show()
plt.savefig(RESULT_PATH + '/ranks_RF.png', bbox_inches='tight')
return ranks
You did not overwrite the values when you replaced the nan, hence it's giving you the errors.替换 nan 时您没有覆盖这些值,因此它给了您错误。
We try an example dataset:我们尝试一个示例数据集:
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(data= iris['data'],
columns= iris['feature_names'] )
df['target'] = iris['target']
# insert some NAs
df = df.mask(np.random.random(df.shape) < .1)
We have a function like yours, I removed the plotting part, because that's another question altogether:我们有一个像你这样的功能,我删除了绘图部分,因为那完全是另一个问题:
def ranks_RF(x_train, y_train):
var_names = x_train.columns
random_forest = RandomForestRegressor(n_estimators=10)
# here you have to reassign back the values
x_train = np.nan_to_num(x_train)
y_train = np.nan_to_num(y_train)
random_forest.fit(x_train, y_train)
res = pd.DataFrame({
"features":var_names,
"importance":random_forest.feature_importances_,
})
res = res.sort_values(['importance'],ascending=False)
res['rank'] = np.arange(len(res))+1
return res
We run it:我们运行它:
ranks_RF(df.iloc[:,0:4],df['target'])
features importance rank
3 petal width (cm) 0.601734 1
2 petal length (cm) 0.191613 2
0 sepal length (cm) 0.132212 3
1 sepal width (cm) 0.074442
This worked for me这对我有用
np.where(x.values >= np.finfo(np.float32).max)
Where x is my pandas Dataframe Then Convert your DataFrame to Float32 if it's not其中 x 是我的 Pandas Dataframe 如果不是,则将您的 DataFrame 转换为 Float32
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.