简体   繁体   English

有没有办法在决策树的每个叶子下面获取样本?

[英]is there any way to get samples under each leaf of a decision tree?

I have trained a decision tree using a dataset. 我使用数据集训练了决策树。 Now I want to see which samples fall under which leaf of the tree. 现在我想看看哪些样本落在树的哪个叶子下面。

From here I want the red circled samples. 从这里我想要红色圆圈样本。

在此输入图像描述

I am using Python's Sklearn's implementation of decision tree . 我正在使用Python的Sklearn的决策树实现。

If you want only the leaf for each sample you can just use 如果您只想要每个样品的叶子,您可以使用

clf.apply(iris.data)

array([ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 14, 5, 5, 5, 5, 5, 5, 10, 5, 5, 5, 5, 5, 10, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 16, 16, 16, 16, 16, 16, 6, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 8, 16, 16, 16, 16, 16, 16, 15, 16, 16, 11, 16, 16, 16, 8, 8, 16, 16, 16, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16]) 数组([1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 ,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 ,1,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,14,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,14,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5 ,5,5,5,10,5,5,5,5,5,10,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,10,5,5,5,5,10,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,10,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5 ,5,16,16,16,16,16,16,6,16,16,16,16,16,16,16,16,16,16,16,16,8,16,16,16,16 ,16,16,15,16,16,11,16,16,16,8,8,16,16,16,15,16,16,16,16,16,16,16,16,16,16 ,16])

If you want to get all samples for each node you could calculate all the decision paths with 如果要获取每个节点的所有样本,可以使用计算所有决策路径

dec_paths = clf.decision_path(iris.data)

Then loop over the decision paths, convert them to arrays with toarray() and check whether they belong to a node or not. 然后遍历决策路径,使用toarray()将它们转换为数组,并检查它们是否属于某个节点。 Everything is stored in a defaultdict where the key is the node number and the values are the sample number. 所有内容都存储在defaultdict ,其中键是节点编号,值是样本编号。

for d, dec in enumerate(dec_paths):
    for i in range(clf.tree_.node_count):
        if dec.toarray()[0][i] == 1:
            samples[i].append(d)

Complete code 完整的代码

import sklearn.datasets
import sklearn.tree
import collections

clf = sklearn.tree.DecisionTreeClassifier(random_state=42)
iris = sklearn.datasets.load_iris()
clf = clf.fit(iris.data, iris.target)

samples = collections.defaultdict(list)
dec_paths = clf.decision_path(iris.data)

for d, dec in enumerate(dec_paths):
    for i in range(clf.tree_.node_count):
        if dec.toarray()[0][i] == 1:
            samples[i].append(d) 

Output 产量

print(samples[13])

[70, 126, 138] [70,126,138]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何找到决策树中每个叶子或节点的索引? - How to find the index of each leaf or node in a Decision Tree? sklearn.tree.DecisionTreeClassifier:获取落入叶节点的所有样本 - sklearn.tree.DecisionTreeClassifier: Get all samples that fell into leaf node 获取数据框中列的唯一值的计数,这些值最终出现在决策树的每个叶节点中? - Getting a count of unique values for a column in data frame that end up in each leaf node of Decision Tree? 在决策树中为每个数据点查找相应的叶节点(scikit-learn) - Finding a corresponding leaf node for each data point in a decision tree (scikit-learn) scikit-learn在哪里保存树结构中每个叶节点的决策标签? - Where does scikit-learn hold the decision labels of each leaf node in its tree structure? 如何检索通向 sklearn 决策树的每个叶节点的完整分支路径? - How to retrieve the full branch path leading to each leaf node of a sklearn Decision Tree? Xgboost - 决策树 - 只有一片叶子 - Xgboost - Decision Tree - Only one leaf 数值数据集的决策树叶节点条件 - Decision Tree leaf node condition for numeric dataset JSON中的决策树-将叶返回给定叶的根路径 - Decision tree in JSON - return leaf to root path for a given leaf 为什么每个步骤的决策树值都不与样本数相加? - Why does this decision tree's values at each step not sum to the number of samples?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM