繁体   English   中英

在 Python 中绘制有向图?

[英]Plot a directed graph in Python?

我正在尝试为客户状态迁移制作有向图或桑基图(任何都可以)。 数据如下所示,count 表示从当前状态迁移到下一个状态的用户数。

**current_state         next_state          count**
New Profile              Initiated           37715
Profile Initiated          End               36411
JobRecommended             End                6202
New                        End                6171
ProfileCreated             JobRecommended     5799
Profile Initiated          ProfileCreated     4360
New                        NotOpted           3751
NotOpted                   Profile Initiated  2817
JobRecommended             InterestedInJob    2542
IntentDetected             ProfileCreated     2334
ProfileCreated             IntentDetected     1839
InterestedInJob            Applied            1671
JobRecommended             NotInterestedInJob 1477
NotInterestedInJob         ProfileCreated     1408
IntentDetected             End                1325
NotOpted                   End                1009
InterestedInJob            ProfileCreated     975
Applied                    IntentDetected     912
NotInterestedInJob         IntentDetected     720
Applied                    ProfileCreated     701
InterestedInJob            End                673

我编写了一个构建 sankey 的代码,但该图不易阅读。 寻找可读的有向图。 这是我的代码:

    df = pd.read_csv('input.csv')

    x = list(set(df.current_state.values) | set(df.next_state))
    di = dict()

    count = 0
    for i in x:
        di[i] = count
        count += 1

    #
    df['source'] = df['current_state'].apply(lambda y : di[y])
    df['target'] = df['next_state'].apply(lambda y : di[y])


    #
    fig = go.Figure(data=[go.Sankey(
        node = dict(
          pad = 15,
          thickness = 20,
          line = dict(color = "black", width = 0.5),
          label = x,
          color = "blue"
        ),
        link = dict(
          source = df.source, 
          target = df.target,
          value = df['count']
      ))])


    #
    fig.update_layout(title_text="Sankey Diagram", font_size=10, autosize=False,
        width=1000,
        height=1000,
        margin=go.layout.Margin(
            l=50,
            r=50,
            b=100,
            t=100,
            pad=4
        ))
    fig.show()

对于有向图, graphviz将是我的首选工具,而不是 Python。

以下脚本txt2dot.py将您的数据转换为 graphviz 的输入文件:

text = '''New Profile              Initiated           37715
Profile Initiated          End               36411
JobRecommended             End                6202
New                        End                6171
ProfileCreated             JobRecommended     5799
Profile Initiated          ProfileCreated     4360
New                        NotOpted           3751
NotOpted                   Profile Initiated  2817
JobRecommended             InterestedInJob    2542
IntentDetected             ProfileCreated     2334
ProfileCreated             IntentDetected     1839
InterestedInJob            Applied            1671
JobRecommended             NotInterestedInJob 1477
NotInterestedInJob         ProfileCreated     1408
IntentDetected             End                1325
NotOpted                   End                1009
InterestedInJob            ProfileCreated     975
Applied                    IntentDetected     912
NotInterestedInJob         IntentDetected     720
Applied                    ProfileCreated     701
InterestedInJob            End                673'''

# Remove ambiguity and make suitable for graphviz.
text = text.replace('New Profile', 'NewProfile')
text = text.replace('New ', 'NewProfile ')
text = text.replace('Profile Initiated', 'ProfileInitiated')
text = text.replace(' Initiated', ' ProfileInitiated')

# Create edges and nodes for graphviz.
edges = [ln.split() for ln in text.splitlines()]
edges = sorted(edges, key=lambda x: -1*int(x[2]))
nodes = sorted(list(set(i[0] for i in edges) | set(i[1] for i in edges)))

print('digraph foo {')
for n in nodes:
    print(f'    {n};')
print()
for item in edges:
    print('    ', item[0],  ' -> ', item[1],  ' [label="', item[2], '"];', sep='')
print('}')

运行python3 txt2dot.py > foo.dot结果:

digraph foo {
    Applied;
    End;
    IntentDetected;
    InterestedInJob;
    JobRecommended;
    NewProfile;
    NotInterestedInJob;
    NotOpted;
    ProfileCreated;
    ProfileInitiated;

    NewProfile -> ProfileInitiated [label="37715"];
    ProfileInitiated -> End [label="36411"];
    JobRecommended -> End [label="6202"];
    NewProfile -> End [label="6171"];
    ProfileCreated -> JobRecommended [label="5799"];
    ProfileInitiated -> ProfileCreated [label="4360"];
    NewProfile -> NotOpted [label="3751"];
    NotOpted -> ProfileInitiated [label="2817"];
    JobRecommended -> InterestedInJob [label="2542"];
    IntentDetected -> ProfileCreated [label="2334"];
    ProfileCreated -> IntentDetected [label="1839"];
    InterestedInJob -> Applied [label="1671"];
    JobRecommended -> NotInterestedInJob [label="1477"];
    NotInterestedInJob -> ProfileCreated [label="1408"];
    IntentDetected -> End [label="1325"];
    NotOpted -> End [label="1009"];
    InterestedInJob -> ProfileCreated [label="975"];
    Applied -> IntentDetected [label="912"];
    NotInterestedInJob -> IntentDetected [label="720"];
    Applied -> ProfileCreated [label="701"];
    InterestedInJob -> End [label="673"];
}

运行dot -o foo.png -Tpng foo.dot给出:

图形化图像

这将创建一个基本的桑基图,假设您:

  1. 将数据保存在名为 state_migration.csv 的文件中
  2. 用破折号/下划线/无替换标签(状态名称)中的空格
  3. 用逗号替换列之间的空格
  4. 安装了 plotly、numpy 和 matplotlib

2 和 3 可以很容易地使用任何非史前文本编辑器,甚至 python 本身,如果它有很多数据。 我强烈建议您避免使用不带引号的值中的空格。

结果

import plotly.graph_objects as go
import numpy as np
import matplotlib

if __name__ == '__main__':

  with open('state_migration.csv', 'r') as finput:
    info = [[ _ for _ in _.strip().lower().split(',') ]
                for _ in finput.readlines()[1:]]
  info_t = [*map(list,zip(*info))] # info transposed

  # this exists to map the data to plotly's node indexing format
  index = {n: i for i, n in enumerate(set(info_t[0]+info_t[1]))}

  fig = go.Figure(data=[go.Sankey(
    node = dict(
      pad = 15,
      thickness = 20,
      line = dict(color = "black", width = 0.5),
      label = list(index.keys()),
      color = np.random.choice( list(matplotlib.colors.cnames.values()),
                                size=len(index.keys()), replace=False )
    ),
    link = dict(
      source = [index[_] for _ in info_t[0]],
      target = [index[_] for _ in info_t[1]],
      value = info_t[2]
  ))])

fig.update_layout(title_text="State Migration", font_size=12)
fig.show()

您可以四处拖动节点。 如果您想预定义它们的位置或检查其他参数,请参阅此内容

我使用的数据是您输入的清理版本:

currentstate,next_state,count
new,initiated,37715
profileinitiated,end,36411
jobrecommended,end,6202
new,end,6171
profilecreated,jobrecommended,5799
profileinitiated,profilecreated,4360
new,notopted,3751
notopted,profileinitiated,2817
jobrecommended,interestedinjob,2542
intentdetected,profilecreated,2334
profilecreated,intentdetected,1839
interestedinjob,applied,1671
jobrecommended,notinterestedinjob,1477
notinterestedinjob,profilecreated,1408
intentdetected,end,1325
notopted,end,1009
interestedinjob,profilecreated,975
applied,intentdetected,912
notinterestedinjob,intentdetected,720
applied,profilecreated,701
interestedinjob,end,673

我将“新配置文件”更改为现有状态“新”,因为图表在其他方面很奇怪。 根据需要随意调整。

我使用的库绝对不是你想要的,我只是更熟悉它们。 对于有向图,Roland Smith 已为您解答。 也可以用 Plotly 来完成,看他们的画廊

  • Plotly 的替代品,按优先顺序排列:matplotlib、seaborne、ggplot、raw dot/graphviz
  • matplotlib 仅在此处用于提供具有预定义十六进制颜色的列表
  • numpy 仅用于从列表中选择一个随机值而无需替换(在本例中为颜色)

在 Python 3.8.1 上测试

看起来 condekind 已经涵盖了答案,但是......当您使用熊猫时,那么这些先前的答案应该有助于组织数据和生成图表的实际方面:

如何使用熊猫数据框定义桑基图的结构?

从数据框中绘制桑基图

alishobeiri有许多有用的示例和代码,您可以使用: https : //plot.ly/~alishobeiri/1591/plotly-sankey-diagrams/#/

连同回答节点放置特定问题的plot.ly 文档

如果桑基图很乱,记住你也可以尝试垂直而不是水平方向。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM