简体   繁体   English

更改使用导出 graphviz 创建的决策树图的颜色

[英]Changing colors for decision tree plot created using export graphviz

I am using scikit's regression tree function and graphviz to generate the wonderful, easy to interpret visuals of some decision trees:我正在使用 scikit 的回归树函数和 graphviz 来生成一些决策树的精彩、易于解释的视觉效果:

dot_data = tree.export_graphviz(Run.reg, out_file=None, 
                         feature_names=Xvar,  
                         filled=True, rounded=True,  
                         special_characters=True) 
graph = pydotplus.graph_from_dot_data(dot_data)
graph.write_png('CART.png')
graph.write_svg("CART.svg")

在此处输入图片说明

This runs perfectly, but I'd like to change the color scheme if possible?这运行完美,但如果可能,我想更改配色方案? The plot represents CO 2 fluxes, so I'd like to make the negative values green and positive brown.该图表示 CO 2通量,因此我想将负值设为绿色,将正值设为棕色。 I can export as svg instead and alter everything manually, but when I do, the text doesn't quite line up with the boxes so changing the colors manually and fixing all the text adds a very tedious step to my workflow that I would really like to avoid!我可以导出为 svg 并手动更改所有内容,但是当我这样做时,文本与框不太对齐,因此手动更改颜色并修复所有文本为我的工作流程添加了一个非常乏味的步骤,我真的很喜欢避免! 在此处输入图片说明

Also, I've seen some trees where the length of the lines connecting nodes is proportional to the % variance explained by the split.此外,我还看到了一些树,其中连接节点的线的长度与拆分解释的方差百分比成正比。 I'd love to be able to do that too if possible?如果可能的话,我也希望能够做到这一点?

  • You can get a list of all the edges via graph.get_edge_list()您可以通过graph.get_edge_list()获取所有边的列表
  • Each source node should have two target nodes, the one with the lower index is evaluated as True, the higher index as False每个源节点应该有两个目标节点,索引较低的一个被评估为True,索引较高的为False
  • Colors can be assigned via set_fillcolor()颜色可以通过set_fillcolor()分配

在此处输入图片说明

import pydotplus
from sklearn.datasets import load_iris
from sklearn import tree
import collections

clf = tree.DecisionTreeClassifier(random_state=42)
iris = load_iris()

clf = clf.fit(iris.data, iris.target)

dot_data = tree.export_graphviz(clf,
                                feature_names=iris.feature_names,
                                out_file=None,
                                filled=True,
                                rounded=True)
graph = pydotplus.graph_from_dot_data(dot_data)

colors = ('brown', 'forestgreen')
edges = collections.defaultdict(list)

for edge in graph.get_edge_list():
    edges[edge.get_source()].append(int(edge.get_destination()))

for edge in edges:
    edges[edge].sort()    
    for i in range(2):
        dest = graph.get_node(str(edges[edge][i]))[0]
        dest.set_fillcolor(colors[i])

graph.write_png('tree.png')

Also, i've seen some trees where the length of the lines connecting nodes is proportional to the % varriance explained by the split.另外,我已经看到一些树,其中连接节点的线的长度与分裂解释的方差百分比成正比。 I'd love to be able to do that too if possible!?如果可能的话,我也希望能够做到这一点!?

You could play with set_weight() and set_len() but that's a bit more tricky and needs some fiddling to get it right but here is some code to get you started.您可以使用set_weight()set_len()但这有点棘手,需要一些摆弄才能正确使用,但这里有一些代码可以帮助您入门。

for edge in edges:
    edges[edge].sort()
    src = graph.get_node(edge)[0]
    total_weight = int(src.get_attributes()['label'].split('samples = ')[1].split('<br/>')[0])
    for i in range(2):
        dest = graph.get_node(str(edges[edge][i]))[0]
        weight = int(dest.get_attributes()['label'].split('samples = ')[1].split('<br/>')[0])
        graph.get_edge(edge, str(edges[edge][0]))[0].set_weight((1 - weight / total_weight) * 100)
        graph.get_edge(edge, str(edges[edge][0]))[0].set_len(weight / total_weight)
        graph.get_edge(edge, str(edges[edge][0]))[0].set_minlen(weight / total_weight)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM