[英]Plot a directed graph in Python?
I am trying to make a directed graph or Sankey diagram (any would work) for customer state migration.我正在尝试为客户状态迁移制作有向图或桑基图(任何都可以)。 Data looks like below, count means the number of users migrating from the current state to next state.数据如下所示,count 表示从当前状态迁移到下一个状态的用户数。
**current_state next_state count**
New Profile Initiated 37715
Profile Initiated End 36411
JobRecommended End 6202
New End 6171
ProfileCreated JobRecommended 5799
Profile Initiated ProfileCreated 4360
New NotOpted 3751
NotOpted Profile Initiated 2817
JobRecommended InterestedInJob 2542
IntentDetected ProfileCreated 2334
ProfileCreated IntentDetected 1839
InterestedInJob Applied 1671
JobRecommended NotInterestedInJob 1477
NotInterestedInJob ProfileCreated 1408
IntentDetected End 1325
NotOpted End 1009
InterestedInJob ProfileCreated 975
Applied IntentDetected 912
NotInterestedInJob IntentDetected 720
Applied ProfileCreated 701
InterestedInJob End 673
I have written a code that builds a sankey, but the plot is not easily readable.我编写了一个构建 sankey 的代码,但该图不易阅读。 Looking for a readable directed graph.寻找可读的有向图。 Here is my code:这是我的代码:
df = pd.read_csv('input.csv')
x = list(set(df.current_state.values) | set(df.next_state))
di = dict()
count = 0
for i in x:
di[i] = count
count += 1
#
df['source'] = df['current_state'].apply(lambda y : di[y])
df['target'] = df['next_state'].apply(lambda y : di[y])
#
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = x,
color = "blue"
),
link = dict(
source = df.source,
target = df.target,
value = df['count']
))])
#
fig.update_layout(title_text="Sankey Diagram", font_size=10, autosize=False,
width=1000,
height=1000,
margin=go.layout.Margin(
l=50,
r=50,
b=100,
t=100,
pad=4
))
fig.show()
For directed graphs, graphviz
would be my tool of choice instead of Python.对于有向图, graphviz
将是我的首选工具,而不是 Python。
The following script txt2dot.py
converts your data into an input file for graphviz:以下脚本txt2dot.py
将您的数据转换为 graphviz 的输入文件:
text = '''New Profile Initiated 37715
Profile Initiated End 36411
JobRecommended End 6202
New End 6171
ProfileCreated JobRecommended 5799
Profile Initiated ProfileCreated 4360
New NotOpted 3751
NotOpted Profile Initiated 2817
JobRecommended InterestedInJob 2542
IntentDetected ProfileCreated 2334
ProfileCreated IntentDetected 1839
InterestedInJob Applied 1671
JobRecommended NotInterestedInJob 1477
NotInterestedInJob ProfileCreated 1408
IntentDetected End 1325
NotOpted End 1009
InterestedInJob ProfileCreated 975
Applied IntentDetected 912
NotInterestedInJob IntentDetected 720
Applied ProfileCreated 701
InterestedInJob End 673'''
# Remove ambiguity and make suitable for graphviz.
text = text.replace('New Profile', 'NewProfile')
text = text.replace('New ', 'NewProfile ')
text = text.replace('Profile Initiated', 'ProfileInitiated')
text = text.replace(' Initiated', ' ProfileInitiated')
# Create edges and nodes for graphviz.
edges = [ln.split() for ln in text.splitlines()]
edges = sorted(edges, key=lambda x: -1*int(x[2]))
nodes = sorted(list(set(i[0] for i in edges) | set(i[1] for i in edges)))
print('digraph foo {')
for n in nodes:
print(f' {n};')
print()
for item in edges:
print(' ', item[0], ' -> ', item[1], ' [label="', item[2], '"];', sep='')
print('}')
Running python3 txt2dot.py > foo.dot
results in:运行python3 txt2dot.py > foo.dot
结果:
digraph foo {
Applied;
End;
IntentDetected;
InterestedInJob;
JobRecommended;
NewProfile;
NotInterestedInJob;
NotOpted;
ProfileCreated;
ProfileInitiated;
NewProfile -> ProfileInitiated [label="37715"];
ProfileInitiated -> End [label="36411"];
JobRecommended -> End [label="6202"];
NewProfile -> End [label="6171"];
ProfileCreated -> JobRecommended [label="5799"];
ProfileInitiated -> ProfileCreated [label="4360"];
NewProfile -> NotOpted [label="3751"];
NotOpted -> ProfileInitiated [label="2817"];
JobRecommended -> InterestedInJob [label="2542"];
IntentDetected -> ProfileCreated [label="2334"];
ProfileCreated -> IntentDetected [label="1839"];
InterestedInJob -> Applied [label="1671"];
JobRecommended -> NotInterestedInJob [label="1477"];
NotInterestedInJob -> ProfileCreated [label="1408"];
IntentDetected -> End [label="1325"];
NotOpted -> End [label="1009"];
InterestedInJob -> ProfileCreated [label="975"];
Applied -> IntentDetected [label="912"];
NotInterestedInJob -> IntentDetected [label="720"];
Applied -> ProfileCreated [label="701"];
InterestedInJob -> End [label="673"];
}
Running dot -o foo.png -Tpng foo.dot
gives:运行dot -o foo.png -Tpng foo.dot
给出:
This creates a basic Sankey Diagram, assuming you:这将创建一个基本的桑基图,假设您:
2 and 3 are easily doable with any non-prehistoric text editor, or even python itself, if it's a lot of data. 2 和 3 可以很容易地使用任何非史前文本编辑器,甚至 python 本身,如果它有很多数据。 I strongly recommend you avoid working with whitespaces in unquoted values.我强烈建议您避免使用不带引号的值中的空格。
import plotly.graph_objects as go
import numpy as np
import matplotlib
if __name__ == '__main__':
with open('state_migration.csv', 'r') as finput:
info = [[ _ for _ in _.strip().lower().split(',') ]
for _ in finput.readlines()[1:]]
info_t = [*map(list,zip(*info))] # info transposed
# this exists to map the data to plotly's node indexing format
index = {n: i for i, n in enumerate(set(info_t[0]+info_t[1]))}
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = list(index.keys()),
color = np.random.choice( list(matplotlib.colors.cnames.values()),
size=len(index.keys()), replace=False )
),
link = dict(
source = [index[_] for _ in info_t[0]],
target = [index[_] for _ in info_t[1]],
value = info_t[2]
))])
fig.update_layout(title_text="State Migration", font_size=12)
fig.show()
You can drag the nodes around.您可以四处拖动节点。 See this if you want to predefine their positions or check other parameters.如果您想预定义它们的位置或检查其他参数,请参阅此内容。
The data I used was a cleaned version of your input:我使用的数据是您输入的清理版本:
currentstate,next_state,count
new,initiated,37715
profileinitiated,end,36411
jobrecommended,end,6202
new,end,6171
profilecreated,jobrecommended,5799
profileinitiated,profilecreated,4360
new,notopted,3751
notopted,profileinitiated,2817
jobrecommended,interestedinjob,2542
intentdetected,profilecreated,2334
profilecreated,intentdetected,1839
interestedinjob,applied,1671
jobrecommended,notinterestedinjob,1477
notinterestedinjob,profilecreated,1408
intentdetected,end,1325
notopted,end,1009
interestedinjob,profilecreated,975
applied,intentdetected,912
notinterestedinjob,intentdetected,720
applied,profilecreated,701
interestedinjob,end,673
I changed "New Profile" to the existing state "New", since the diagram was otherwise weird.我将“新配置文件”更改为现有状态“新”,因为图表在其他方面很奇怪。 Feel free to tweak as you need.根据需要随意调整。
The libraries I used are absolutely not needed for what you want, I'm simply more familiar with them.我使用的库绝对不是你想要的,我只是更熟悉它们。 For the directed graph, Roland Smith has you covered.对于有向图,Roland Smith 已为您解答。 It can also be done with Plotly, see their gallery也可以用 Plotly 来完成,看他们的画廊
Tested on Python 3.8.1在 Python 3.8.1 上测试
looks like condekind has the answer covered but ... As you are using pandas, then these previous answers should help with the practical side of getting the data organised and producing the diagram :看起来 condekind 已经涵盖了答案,但是......当您使用熊猫时,那么这些先前的答案应该有助于组织数据和生成图表的实际方面:
How to define the structure of a sankey diagram using a pandas dataframe? 如何使用熊猫数据框定义桑基图的结构?
Draw Sankey Diagram from dataframe 从数据框中绘制桑基图
and alishobeiri has a number of useful examples and code you could use: https://plot.ly/~alishobeiri/1591/plotly-sankey-diagrams/#/ alishobeiri有许多有用的示例和代码,您可以使用: https : //plot.ly/~alishobeiri/1591/plotly-sankey-diagrams/#/
Along with the plot.ly documentation which answers the specific question of node placement.连同回答节点放置特定问题的plot.ly 文档。
If the sankey diagram is messy remember you can also try vertical rather than horizontal orientation.如果桑基图很乱,记住你也可以尝试垂直而不是水平方向。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.