[英]ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). randomforest run
#fill -999 to NAs
X = X_train.fillna(-999)
y = y_train.fillna(-999)
import lightgbm as lgb
import xgboost as xgb
NFOLDS = 8
folds = KFold(n_splits=NFOLDS)
#====================================
xgb_submission=sample_submission.copy()
xgb_submission['isFraud'] = 0
import xgboost as xgb
from sklearn.metrics import roc_auc_score
for fold_n, (train_index, valid_index) in enumerate(folds.split(X)):
X_train_, X_valid = X.iloc[train_index], X.iloc[valid_index]
y_train_, y_valid = y.iloc[train_index], y.iloc[valid_index]
#xgbclf.fit(X_train_,y_train_)
rf_clf1 = RandomForestClassifier(n_estimators=300, max_depth = 10, min_samples_leaf=8, \
min_samples_split=8, random_state=0)
rf_clf1.fit(X_train,y_train_)
pred = rf_clf1.predict(X_test)
print(pred)
I checked the X or y has any Nan but no我检查了 X 或 y 有任何 Nan 但没有
but it gives the error with ValueError: Input contains NaN, infinity or a value too large for dtype('float32').但它给出了 ValueError 的错误:输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值。
> print(type(X),type(y))
> <class 'pandas.core.frame.DataFrame'> <class'pandas.core.series.Series'>
When does this error appear actually - while assigning X_train_, X_valid values or while fitting the datasets to RandomForest algorithm?此错误何时实际出现 - 在分配 X_train_、X_valid 值或将数据集拟合到 RandomForest 算法时?
I also see from the code that in the first turn you define X_train_ dataframe:我还从代码中看到,在第一轮中您定义了 X_train_dataframe :
**X_train_**, X_valid = X.iloc[train_index], X.iloc[valid_index]
Whereas you fit the rf_clf1 object to another dataset (namely: X_train )而您将 rf_clf1 object 拟合到另一个数据集(即: X_train )
rf_clf1.fit(X_train,y_train_)
So here the missing _ in the variable name might be the case as well.所以这里变量名中缺少的_也可能是这种情况。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.