简体   繁体   English

用于区分具有相同名称的节点的正确图形数据结构是什么?

[英]What is the correct graph data structure to differentiate between nodes with the same name?

I'm learning about graphs(they seem super useful) and was wondering if I could get some advice on a possible way to structure my graphs. 我正在学习图形(它们看起来非常有用),并且想知道我是否可以就构建图形的可能方法获得一些建议。

Simply, Lets say I get purchase order data everyday and some days its the same as the day before and on others its different. 简单来说,让我们说我每天都会得到采购订单数据,有些日子与前一天相同,而其他日子则不同。 For example, yesterday I had an order of pencils and erasers, I create the two nodes to represent them and then today I get an order for an eraser and a marker, and so on. 例如,昨天我订购了铅笔和橡皮擦,我创建了两个节点来代表它们,然后今天我收到了橡皮擦和标记的订单,依此类推。 After each day, my program also looks to see who ordered what, and if Bob ordered a pencil yesterday and then an eraser today, it creates a directed edge. 在每一天之后,我的程序还会看到谁订购了什么,如果Bob昨天订购了一支铅笔,然后今天订购了一支橡皮,它就会产生一个有针对性的优势。 My logic for this is I can see who bought what on each day and I can track the purchase behaviour of Bob(and maybe use it to infer patterns with himself or other users). 我的逻辑是,我可以看到谁在每天购买了什么,我可以跟踪Bob的购买行为(也许可以用它来推断自己或其他用户的模式)。

My problem is, I'm using networkx(python) and creating a node 'pencil' for yesterday and then another node 'pencil' for day2 and I can't differentiate them. 我的问题是,我正在使用networkx(python)并为昨天创建节点'pencil',然后为day2创建另一个节点'pencil',我无法区分它们。

I thought(and have been) naming it day2-pencil and then scanning the entire graph and stripping out the 'day2-' to track pencil orders. 我想(并且已经)将它命名为day2-pencil,然后扫描整个图形并剥离'day2-'来跟踪铅笔订单。 This seems wrong to me(not to mention expensive on the processor). 这对我来说似乎不对(更不用说处理器上昂贵了)。 I think the key would be if I can somehow mark each day as its own subgraph so when I want to study a specific day or a few days, I don't have to scan the entire graph. 我认为关键是如果我能以某种方式将每一天标记为自己的子图,那么当我想要研究特定日期或几天时,我不必扫描整个图表。

As my test data gets larger, its getting more and more confusing so I am wondering what the best practice is? 随着我的测试数据越来越大,它变得越来越混乱,所以我想知道最佳实践是什么? Any generate suggestions would be great(as networkx seems pretty full featured so they probably have a way of doing it). 任何生成建议都会很棒(因为networkx看起来非常全面,所以他们可能有办法做到这一点)。

Thanks in advance! 提前致谢!

Update: Still no luck, but this maybe helpful: 更新:仍然没有运气,但这可能有用:

import networkx as nx
G=nx.Graph()
G.add_node('pencil', day='1/1/12', colour='blue')
G.add_node('eraser', day='1/1/12', colour='rubberish colour. I know thats not a real colour')
G.add_node('pencil', day='1/2/12', colour='blue')

The result I get typing the following command G.node is: 我输入以下命令G.node是:

{'pencil': {'colour': 'blue', 'day': '1/2/12'}, 'eraser': {'colour': 'rubberish colour. I know thats not a real colour', 'day': '1/1/12'}}

Its obviously overwriting the pencil from 1/1/12 with 1/2/12 one, not sure if I can make a distint one. 它明显覆盖了1/1/12的铅笔与1/2/12一支,不知道我是否可以制造一个。

This is mostly depending on your goal actually. 这主要取决于你的目标。 What you want to analyze is the definitive factor in your graph design. 您想要分析的是图表设计中的决定性因素。 But, looking at your structure, a general structure would be nodes for Customers and Products , that are connected by Days (I don't know if this would help you any better but this is in fact a bipartite graph ). 但是,看一下你的结构,一般结构就是CustomersProducts节点,它们是按Days连接的(我不知道这对你有什么帮助,但实际上这是一个二分图 )。

So your structure would be something like this: 所以你的结构将是这样的:

node(Person) --- edge(Day) ---> node(Product)

Let's say, Bob buys a pencil on 1/1/12: 比方说,鲍勃在2012年1月1日买了一支铅笔:

node(Bob) --- 1/1/12 ---> node(Pencil)

Ok, now Bob goes and buys another pencil on 1/2/12: 好的,现在Bob在1/2/12买了另一支铅笔:

          -- 1/1/12 --
         /            \
node(Bob)              > node(Pencil)
         \            /
          -- 1/2/12 --

so on... 等......

This is actually possible with networkx . 这实际上可以通过networkx Since you have multiple edges between nodes, you have to choose between MultiGraph Mor MultiDiGraph depending on the directed-ness of your edges. 由于节点之间有多条边,因此您必须在MultiGraph Mor MultiDiGraph之间进行选择,具体取决于边缘的MultiDiGraph

In : g = networkx.MultiDiGraph()

In : g.add_node("Bob")
In : g.add_node("Alice")

In : g.add_node("Pencil")

In : g.add_edge("Bob","Pencil",key="1/1/12")
In : g.add_edge("Bob","Pencil",key="1/2/12")

In : g.add_edge("Alice","Pencil",key="1/3/12")
In : g.add_edge("Alice","Pencil",key="1/2/12")

In : g.edges(keys=True)
Out:
[('Bob', 'Pencil', '1/2/12'),
 ('Bob', 'Pencil', '1/1/12'),
 ('Alice', 'Pencil', '1/3/12'),
 ('Alice', 'Pencil', '1/2/12')]

so far, not bad. 到目前为止,还不错。 You can actually query things like "Did Alice buy a Pencil on 1/1/12?". 你实际上可以查询“爱丽丝是否在1/1/12购买铅笔?”之类的内容。

In : g.has_edge("Alice","Pencil","1/1/12")
Out: False

In : g.has_edge("Alice","Pencil","1/2/12")
Out: True

Things might get bad if you want all orders on specific days. 如果您想在特定日期订购所有订单,情况可能会变糟。 By bad, I don't mean code-wise, but computation-wise. 糟糕的是,我不是指代码方面,而是计算方面。 Code-wise it is rather simple: 代码方面它很简单:

In : [(from_node, to_node) for from_node, to_node, key in g.edges(keys=True) if key=="1/2/12"]
Out: [('Bob', 'Pencil'), ('Alice', 'Pencil')]

But this scans all the edges in the network and filters the ones you want. 但是这会扫描网络中的所有边缘并过滤掉您想要的边缘。 I don't think networkx has any better way. 我不认为networkx有更好的方法。

Try this: 尝试这个:

Give each node a unique integer ID. 为每个节点提供唯一的整数ID。 Then, create a dictionary, nodes, such that: 然后,创建一个字典,节点,这样:

nodes['pencil'] = [1,4,...] <- where all of these correspond to a node with the pencil attribute. nodes ['pencil'] = [1,4,...] < - 其中所有这些都对应于具有铅笔属性的节点。 Replace 'pencil' with whatever other attributes you're interested in. 将'铅笔'替换为您感兴趣的其他任何属性。

Just make sure that when you add a node with 'pencil', you update the dictionary: 只需确保当您使用'pencil'添加节点时,您将更新字典:

node['pencil'].append(new_node_id). 节点[ '铅笔']。追加(new_node_id)。 Likewise with node deletion. 同样节点删除。

Graphs are not the best approach for this. 图表不是最佳方法。 A relational database such as MySQL is the right tool for storing this data and performing such queries as who bought what when. 像MySQL这样的关系数据库是存储这些数据并执行诸如何时购买的数据的正确工具。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM