简体   繁体   English

在 Python 中绘制有向图?

[英]Plot a directed graph in Python?

I am trying to make a directed graph or Sankey diagram (any would work) for customer state migration.我正在尝试为客户状态迁移制作有向图或桑基图(任何都可以)。 Data looks like below, count means the number of users migrating from the current state to next state.数据如下所示,count 表示从当前状态迁移到下一个状态的用户数。

**current_state         next_state          count**
New Profile              Initiated           37715
Profile Initiated          End               36411
JobRecommended             End                6202
New                        End                6171
ProfileCreated             JobRecommended     5799
Profile Initiated          ProfileCreated     4360
New                        NotOpted           3751
NotOpted                   Profile Initiated  2817
JobRecommended             InterestedInJob    2542
IntentDetected             ProfileCreated     2334
ProfileCreated             IntentDetected     1839
InterestedInJob            Applied            1671
JobRecommended             NotInterestedInJob 1477
NotInterestedInJob         ProfileCreated     1408
IntentDetected             End                1325
NotOpted                   End                1009
InterestedInJob            ProfileCreated     975
Applied                    IntentDetected     912
NotInterestedInJob         IntentDetected     720
Applied                    ProfileCreated     701
InterestedInJob            End                673

I have written a code that builds a sankey, but the plot is not easily readable.我编写了一个构建 sankey 的代码,但该图不易阅读。 Looking for a readable directed graph.寻找可读的有向图。 Here is my code:这是我的代码:

    df = pd.read_csv('input.csv')

    x = list(set(df.current_state.values) | set(df.next_state))
    di = dict()

    count = 0
    for i in x:
        di[i] = count
        count += 1

    #
    df['source'] = df['current_state'].apply(lambda y : di[y])
    df['target'] = df['next_state'].apply(lambda y : di[y])


    #
    fig = go.Figure(data=[go.Sankey(
        node = dict(
          pad = 15,
          thickness = 20,
          line = dict(color = "black", width = 0.5),
          label = x,
          color = "blue"
        ),
        link = dict(
          source = df.source, 
          target = df.target,
          value = df['count']
      ))])


    #
    fig.update_layout(title_text="Sankey Diagram", font_size=10, autosize=False,
        width=1000,
        height=1000,
        margin=go.layout.Margin(
            l=50,
            r=50,
            b=100,
            t=100,
            pad=4
        ))
    fig.show()

For directed graphs, graphviz would be my tool of choice instead of Python.对于有向图, graphviz将是我的首选工具,而不是 Python。

The following script txt2dot.py converts your data into an input file for graphviz:以下脚本txt2dot.py将您的数据转换为 graphviz 的输入文件:

text = '''New Profile              Initiated           37715
Profile Initiated          End               36411
JobRecommended             End                6202
New                        End                6171
ProfileCreated             JobRecommended     5799
Profile Initiated          ProfileCreated     4360
New                        NotOpted           3751
NotOpted                   Profile Initiated  2817
JobRecommended             InterestedInJob    2542
IntentDetected             ProfileCreated     2334
ProfileCreated             IntentDetected     1839
InterestedInJob            Applied            1671
JobRecommended             NotInterestedInJob 1477
NotInterestedInJob         ProfileCreated     1408
IntentDetected             End                1325
NotOpted                   End                1009
InterestedInJob            ProfileCreated     975
Applied                    IntentDetected     912
NotInterestedInJob         IntentDetected     720
Applied                    ProfileCreated     701
InterestedInJob            End                673'''

# Remove ambiguity and make suitable for graphviz.
text = text.replace('New Profile', 'NewProfile')
text = text.replace('New ', 'NewProfile ')
text = text.replace('Profile Initiated', 'ProfileInitiated')
text = text.replace(' Initiated', ' ProfileInitiated')

# Create edges and nodes for graphviz.
edges = [ln.split() for ln in text.splitlines()]
edges = sorted(edges, key=lambda x: -1*int(x[2]))
nodes = sorted(list(set(i[0] for i in edges) | set(i[1] for i in edges)))

print('digraph foo {')
for n in nodes:
    print(f'    {n};')
print()
for item in edges:
    print('    ', item[0],  ' -> ', item[1],  ' [label="', item[2], '"];', sep='')
print('}')

Running python3 txt2dot.py > foo.dot results in:运行python3 txt2dot.py > foo.dot结果:

digraph foo {
    Applied;
    End;
    IntentDetected;
    InterestedInJob;
    JobRecommended;
    NewProfile;
    NotInterestedInJob;
    NotOpted;
    ProfileCreated;
    ProfileInitiated;

    NewProfile -> ProfileInitiated [label="37715"];
    ProfileInitiated -> End [label="36411"];
    JobRecommended -> End [label="6202"];
    NewProfile -> End [label="6171"];
    ProfileCreated -> JobRecommended [label="5799"];
    ProfileInitiated -> ProfileCreated [label="4360"];
    NewProfile -> NotOpted [label="3751"];
    NotOpted -> ProfileInitiated [label="2817"];
    JobRecommended -> InterestedInJob [label="2542"];
    IntentDetected -> ProfileCreated [label="2334"];
    ProfileCreated -> IntentDetected [label="1839"];
    InterestedInJob -> Applied [label="1671"];
    JobRecommended -> NotInterestedInJob [label="1477"];
    NotInterestedInJob -> ProfileCreated [label="1408"];
    IntentDetected -> End [label="1325"];
    NotOpted -> End [label="1009"];
    InterestedInJob -> ProfileCreated [label="975"];
    Applied -> IntentDetected [label="912"];
    NotInterestedInJob -> IntentDetected [label="720"];
    Applied -> ProfileCreated [label="701"];
    InterestedInJob -> End [label="673"];
}

Running dot -o foo.png -Tpng foo.dot gives:运行dot -o foo.png -Tpng foo.dot给出:

图形化图像

This creates a basic Sankey Diagram, assuming you:这将创建一个基本的桑基图,假设您:

  1. Save your data in a file called state_migration.csv将数据保存在名为 state_migration.csv 的文件中
  2. Replace whitespaces in labels (state names) with dash/underscore/nothing用破折号/下划线/无替换标签(状态名称)中的空格
  3. Replace whitespaces between columns with commas用逗号替换列之间的空格
  4. Have plotly, numpy and matplotlib installed安装了 plotly、numpy 和 matplotlib

2 and 3 are easily doable with any non-prehistoric text editor, or even python itself, if it's a lot of data. 2 和 3 可以很容易地使用任何非史前文本编辑器,甚至 python 本身,如果它有很多数据。 I strongly recommend you avoid working with whitespaces in unquoted values.我强烈建议您避免使用不带引号的值中的空格。

Result结果

import plotly.graph_objects as go
import numpy as np
import matplotlib

if __name__ == '__main__':

  with open('state_migration.csv', 'r') as finput:
    info = [[ _ for _ in _.strip().lower().split(',') ]
                for _ in finput.readlines()[1:]]
  info_t = [*map(list,zip(*info))] # info transposed

  # this exists to map the data to plotly's node indexing format
  index = {n: i for i, n in enumerate(set(info_t[0]+info_t[1]))}

  fig = go.Figure(data=[go.Sankey(
    node = dict(
      pad = 15,
      thickness = 20,
      line = dict(color = "black", width = 0.5),
      label = list(index.keys()),
      color = np.random.choice( list(matplotlib.colors.cnames.values()),
                                size=len(index.keys()), replace=False )
    ),
    link = dict(
      source = [index[_] for _ in info_t[0]],
      target = [index[_] for _ in info_t[1]],
      value = info_t[2]
  ))])

fig.update_layout(title_text="State Migration", font_size=12)
fig.show()

You can drag the nodes around.您可以四处拖动节点。 See this if you want to predefine their positions or check other parameters.如果您想预定义它们的位置或检查其他参数,请参阅此内容

The data I used was a cleaned version of your input:我使用的数据是您输入的清理版本:

currentstate,next_state,count
new,initiated,37715
profileinitiated,end,36411
jobrecommended,end,6202
new,end,6171
profilecreated,jobrecommended,5799
profileinitiated,profilecreated,4360
new,notopted,3751
notopted,profileinitiated,2817
jobrecommended,interestedinjob,2542
intentdetected,profilecreated,2334
profilecreated,intentdetected,1839
interestedinjob,applied,1671
jobrecommended,notinterestedinjob,1477
notinterestedinjob,profilecreated,1408
intentdetected,end,1325
notopted,end,1009
interestedinjob,profilecreated,975
applied,intentdetected,912
notinterestedinjob,intentdetected,720
applied,profilecreated,701
interestedinjob,end,673

I changed "New Profile" to the existing state "New", since the diagram was otherwise weird.我将“新配置文件”更改为现有状态“新”,因为图表在其他方面很奇怪。 Feel free to tweak as you need.根据需要随意调整。

The libraries I used are absolutely not needed for what you want, I'm simply more familiar with them.我使用的库绝对不是你想要的,我只是更熟悉它们。 For the directed graph, Roland Smith has you covered.对于有向图,Roland Smith 已为您解答。 It can also be done with Plotly, see their gallery也可以用 Plotly 来完成,看他们的画廊

  • Alternatives to Plotly, in order of preference: matplotlib, seaborne, ggplot, raw dot/graphviz Plotly 的替代品,按优先顺序排列:matplotlib、seaborne、ggplot、raw dot/graphviz
  • matplotlib was only used here to supply a list with pre-defined hex colors matplotlib 仅在此处用于提供具有预定义十六进制颜色的列表
  • numpy was only used to pick a random value from a list without replacement (a color in this case) numpy 仅用于从列表中选择一个随机值而无需替换(在本例中为颜色)

Tested on Python 3.8.1在 Python 3.8.1 上测试

looks like condekind has the answer covered but ... As you are using pandas, then these previous answers should help with the practical side of getting the data organised and producing the diagram :看起来 condekind 已经涵盖了答案,但是......当您使用熊猫时,那么这些先前的答案应该有助于组织数据和生成图表的实际方面:

How to define the structure of a sankey diagram using a pandas dataframe? 如何使用熊猫数据框定义桑基图的结构?

Draw Sankey Diagram from dataframe 从数据框中绘制桑基图

and alishobeiri has a number of useful examples and code you could use: https://plot.ly/~alishobeiri/1591/plotly-sankey-diagrams/#/ alishobeiri有许多有用的示例和代码,您可以使用: https : //plot.ly/~alishobeiri/1591/plotly-sankey-diagrams/#/

Along with the plot.ly documentation which answers the specific question of node placement.连同回答节点放置特定问题的plot.ly 文档

If the sankey diagram is messy remember you can also try vertical rather than horizontal orientation.如果桑基图很乱,记住你也可以尝试垂直而不是水平方向。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM