简体繁体 English

scikit-learn 的 DecisionTreeRegressor 是否进行真正的多输出回归？

[英]Does scikit-learn's DecisionTreeRegressor do true multi-output regression?

原文 2017-09-05 20:18:38 9 1 python/ machine-learning/ scikit-learn

I have run in to a ML problem that requires us to use a multi-dimensional Y. Right now we are training independent models on each dimension of this output, which does not take advantage of additional information from the fact outputs are correlated.我遇到了一个 ML 问题，需要我们使用多维 Y。现在我们正在这个输出的每个维度上训练独立模型，它没有利用来自事实输出相关的附加信息。

I have been reading this to learn more about the few ML algorithms which have been truly extended to handle multidimensional outputs.我一直在阅读这篇文章，以了解有关真正扩展到处理多维输出的少数 ML 算法的更多信息。 Decision Trees are one of them.决策树就是其中之一。

Does scikit-learn use "Multi-target regression trees" in the event fit(X,Y) is given a multidimensional Y, or does it fit a separate tree for each dimension? scikit-learn 是否在事件中使用“多目标回归树” fit(X,Y) 被赋予多维 Y，或者它是否适合每个维度的单独树？ I spent some time looking at the code but didn't figure it out.我花了一些时间查看代码，但没有弄清楚。

1 个解决方案

After more digging, the only difference between a tree given points labeled with a single-dimensional Y versus one given points with multi-dimensional labels is in the Criterion object it uses to decide splits.经过更多的挖掘，一棵树给定的单维 Y 标记点与一个多维标签的给定点之间的唯一区别在于它用来决定分裂的 Criterion 对象。 A Criterion can handle multi-dimensional labels, so the result of fitting a DecisionTreeRegressor will be a single regression tree regardless of the dimension of Y. Criterion 可以处理多维标签，因此拟合 DecisionTreeRegressor 的结果将是单个回归树，而不管 Y 的维数。

This implies that, yes, scikit-learn does use true multi-target regression trees, which can leverage correlated outputs to positive effect.这意味着，是的，scikit-learn 确实使用了真正的多目标回归树，它可以利用相关的输出产生积极的影响。