[英]catboost shows very bad result on a toy dataset
Today I've tried to test an amazing Catboost library published recently by Yandex but it shows very poor results even on a toy dataset. 今天,我已经尝试测试Yandex最近发布的惊人的Catboost库,但是即使在玩具数据集上,它也显示出非常差的结果。 I've tried to find a root of my problem but due to the lack of proper documentation and topics about the library I can't figure out what's going on.
我试图找到问题的根源,但是由于缺乏有关该库的适当文档和主题,我无法弄清发生了什么。 Please help me =) I'm using Anaconda 3 x64 with Python 3.6.
请帮助我=)我正在将Anaconda 3 x64与Python 3.6配合使用。
from sklearn.datasets import make_classification
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, roc_curve, f1_score, make_scorer
from catboost import CatBoostClassifier
X,y = make_classification( n_classes=2
,n_clusters_per_class=2
,n_features=10
,n_informative=4
,n_repeated=2
,shuffle=True
,random_state=564
,n_samples=10000
)
X_train,X_test,y_train,y_test = train_test_split(X,y,train_size = 0.8)
cb = CatBoostClassifier(depth=3,custom_loss=
['Accuracy','AUC'],
logging_level='Silent',
iterations=500,
od_type='Iter',
od_wait=20)
cb.fit(X_train,y_train,eval_set=(X_test,y_test),plot=True,use_best_model=True)
pred = cb.predict_proba(X_test)[:,1]
tpr,fpr,_=roc_curve(y_score=pred,y_true=y_test)
#just to show the difference
from sklearn.ensemble import GradientBoostingClassifier
gbc = GradientBoostingClassifier().fit(X_train,y_train)
pred_gbc = gbc.predict_proba(X_test)[:,1]
tpr_xgb,fpr_xgb,_=roc_curve(y_score=pred_gbc,y_true=y_test)
plt.plot(tpr,fpr,color='orange')
plt.plot(tpr_xgb,fpr_xgb,color='red')
plt.show()
It was a bug. 这是一个错误。 Be careful and ensure you are using the latest version.
请注意并确保您使用的是最新版本。 The bug was fixed in 0.6.1 version.
该错误已在0.6.1版本中修复。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.