了解scikit的决策树 - 不一致的学习

Question

我一直在使用tsfresh软件包来查找时间序列的相关功能。 它输出大约300个“相关”特征，这些特征通过了p测试阈值，以便为每个特征提供可预测性。 当我使用scikit的DecisionTreeClassifier()训练分类器时，我得到一些奇怪的结果。 每次我执行树的学习时，它返回一个只有两个级别的树，每次它使用的功能都不同。 我很困惑。 树每次都做得很好但是我没有看到所有的水平？ 使用此代码：

from sklearn import tree
from sklearn.tree import _tree
X_train, X_test, y_train, y_test = train_test_split(X_filtered, y, test_size=.2)
cl = DecisionTreeClassifier()
cl.fit(X_train, y_train)
tree.export_graphviz(cl,out_file='tree.dot',feature_names=X.columns)

len(X.colums)超过300的地方每次返回两个级别的决策树。

Answer 1

该行的输出是随机的：

X_train, X_test, y_train, y_test = train_test_split(X_filtered, y, test_size=.2)

也就是说，每次在训练集和测试集中分割数据时，都会得到不同的集合。 您可以使用random_state属性来获取可预测的拆分：

X_train, X_test, y_train, y_test = train_test_split(X_filtered, y, test_size=.2, random_state=4)

这样做应该为树提供相同的拆分功能。

了解scikit的决策树 - 不一致的学习

问题描述

1 个解决方案

解决方案1
0 2016-12-03 11:43:18

了解scikit的决策树 - 不一致的学习

问题描述

1 个解决方案

解决方案1 0 2016-12-03 11:43:18

解决方案1
0 2016-12-03 11:43:18