简体   繁体   English

NetworkX - 生成随机连接的二分图

[英]NetworkX - generating a random connected bipartite graph

I'm using NetworkX to generate a bipartite graph using either nx.bipartite.random_graph or nx.bipartite.gnmk_random_graph , as follows: 我正在使用NetworkX使用nx.bipartite.random_graphnx.bipartite.gnmk_random_graph生成二分图,如下所示:

B = bipartite.gnmk_random_graph(5,6,10)
bottom_nodes, top_nodes = bipartite.sets(B)

However, I get an error: 但是,我收到一个错误:

networkx.exception.AmbiguousSolution: Disconnected graph: Ambiguous solution for bipartite sets.

It is just a single line, so I'm not sure how I could be doing this wrong and why their package would be returning (what I assume is) an invalid bipartite graph. 它只是一行,所以我不确定我是怎么做错的,以及为什么他们的包会返回(我假设是)一个无效的二分图。

Thanks. 谢谢。

EDIT: I just realised that I need to specify a minimum number of edges/probability for the third argument. 编辑:我刚刚意识到我需要为第三个参数指定最小边数/概率。

Eg bipartite.random_graph(5,6,0.6) and having p>0.5 gets rid of the error. 例如, bipartite.random_graph(5,6,0.6)并且p>0.5可以消除错误。 Similarly, bipartite.gnmk_random_graph(5,6,11) where k>n+m . 类似地, bipartite.gnmk_random_graph(5,6,11)其中k>n+m I didn't realise this was the case, as I assumed if the number of edges was lower than required to connect every vertex there would just be some floating vertices. 我没有意识到这种情况,因为我假设如果边缘的数量低于连接每个顶点所需的边数,那么就会有一些浮动顶点。

Thanks for your help! 谢谢你的帮助!

Considering that you have a {5, 6} bipartite graph with only 10 edges, its very likely that you graph will be disconnected (it is so sparse that you even have a high probability of having isolated nodes). 考虑到你有一个只有10个边的{5,6}二分图,很可能你的图将被断开(它很稀疏,你甚至很可能有孤立的节点)。

import networkx as nx
import random

random.seed(0)

B = nx.bipartite.gnmk_random_graph(5,6,10)
isolated_nodes = set(B.nodes())
for (u, v) in B.edges():
  isolated_nodes -= {u}
  isolated_nodes -= {v}
print(isolated_nodes)

Will show you that node with id=1 is isolated. 将显示id = 1的节点被隔离。 What you can do to make your graph connected is to only keep its largest connected component: 使图表连接的方法是仅保留其最大的连接组件:

import networkx as nx
import random

random.seed(0)

B = nx.bipartite.gnmk_random_graph(5,6,11)
components = sorted(nx.connected_components(B), key=len, reverse=True)
largest_component = components[0]
C = B.subgraph(largest_component)

Which will here only remove node 1 (an isolated node). 这将仅删除节点1(隔离节点)。

Now the only question is "how random this new graph is". 现在唯一的问题是“这个新图是多么随机”。 In other words, does it pick any graph in the set of random connected bipartite graphs with 5-6 nodes and 10 edges with equal probability. 换句话说,它是否选择具有5-6个节点和10个具有相等概率边缘的随机连通二分图集合中的任何图形。 For now I'm not sure, but it looks decent I think. 现在我不确定,但我觉得它看起来不错。

Of course what you suggest (picking a graph until its connected) will be ok, but it can be costly (depending on the 3 parameters of course). 当然你建议的东西(在连接之前选择一个图表)都可以,但是它可能很昂贵(当然取决于3个参数)。

Edit I'm dumb, this can't be ok as the new graph doesn't have the right number of nodes/edges. 编辑我很蠢,这可能不行,因为新图形没有正确数量的节点/边缘。 But there should be a better solution than just retry until you get a good graph. 但是应该有一个更好的解决方案,而不仅仅是重试,直到你得到一个好的图表。 Hmm that's interesting ... 嗯这很有趣......

2nd edit Maybe this answer could help in finding a good solution to this problem. 第2次编辑也许这个答案可以帮助找到解决这个问题的好方法。

3rd edit and a suggestion 第3次编辑和建议

As you have noticed in the question I linked, the accepted answer is not really correct as the generated graph is not selected uniformly at random in the set of expected graphs. 正如您在我链接的问题中所注意到的那样,接受的答案并不十分正确,因为生成的图形在预期图形集中未随机均匀选择。 We can do something a bit similar here to have a first decent solution. 我们可以做一些类似的事情来获得第一个合适的解决方案。 The idea is to first create a connected bipartite graph with the minimum number of edges by iteratively picking isolated nodes and connected them to the other side of the bipartite graph. 我们的想法是首先通过迭代选择孤立节点并将它们连接到二分图的另一侧来创建具有最小边数的连通二分图。 For that we will create two sets N and M , create a first edge from N to M . 为此,我们将创建两组NM ,创建从NM的第一条边。 Then we will pick a random isolated node (from either N or M ) and connected it to a random non-isolated node from the other side. 然后我们将选择一个随机隔离节点(来自NM )并将其连接到另一侧的随机非隔离节点。 Once we don't have any more isolated node we will have exactly n+m-1 edges, we will thus need to add k-(n+m-1) additional edges to the graph to match the original constraints. 一旦我们没有任何更多的孤立节点,我们就会有n + m-1个边缘,因此我们需要在图中添加k-(n + m-1)个附加边以匹配原始约束。

Here is the code corresponding to that algorithm 这是与该算法对应的代码

import networkx as nx
import random

random.seed(0)

def biased_random_connected_bipartite(n, m, k):
  G = nx.Graph()

  # These will be the two components of the bipartite graph
  N = set(range(n)) 
  M = set(range(n, n+m))
  G.add_nodes_from(N)
  G.add_nodes_from(M)

  # Create a first random edge 
  u = random.choice(tuple(N))
  v = random.choice(tuple(M))
  G.add_edge(u, v)

  isolated_N = set(N-{u})
  isolated_M = set(M-{v})
  while isolated_N and isolated_M:
    # Pick any isolated node:
    isolated_nodes = isolated_N|isolated_M
    u = random.choice(tuple(isolated_nodes))

    # And connected it to the existing connected graph:
    if u in isolated_N:
      v = random.choice(tuple(M-isolated_M))
      G.add_edge(u, v)
      isolated_N.remove(u)
    else:
      v = random.choice(tuple(N-isolated_N))
      G.add_edge(u, v)
      isolated_M.remove(u)

  # Add missing edges
  for i in range(k-len(G.edges())):
    u = random.choice(tuple(N))
    v = random.choice(tuple(M))
    G.add_edge(u, v)

  return G

B = biased_random_connected_bipartite(5, 6, 11)

But I repeat, this graph is not select uniformly at random in the set of all possible bipartite graphs (with the constraints we defined on n, m and k). 但我再说一遍, 这个图不是在所有可能的二分图集中随机均匀选择的 (我们在n,m和k上定义了约束)。 As I said it in the other post, this graph will tend to have some nodes with higher degree than other. 正如我在另一篇文章中所说,这个图表往往会有一些比其他节点更高的节点。 This is because we connect isolated nodes to the connected component one by one, therefore nodes that have been added sooner in the process will tend to attract more nodes (preferential attachment). 这是因为我们将隔离的节点逐个连接到连接的组件,因此在过程中更快添加的节点将倾向于吸引更多节点(优先连接)。 I asked the question on cstheory to see if any bright ideas come up. 我问了关于cstheory问题 ,看看是否有任何好主意。

edit I added another solution than the one presented here, it's a bit better but still not a good one. 编辑我添加了另一个解决方案而不是这里提供的解决方案,它有点好但仍然不是一个好的解决方案。

SHORT ANSWER 简短的回答

You want to do 你想做

B = bipartite.gnmk_random_graph(5,6,10)
top = [node for node in B.nodes() if B.node[node]['bipartite']==0]
bottom = [node for node in B.nodes() if B.node[node]['bipartite']==1]

Explanation 说明

So when you generate this bipartite graph, it is likely to be disconnected. 因此,当您生成此二分图时,可能会断开连接。 Let's say it has 2 separate components X and Y . 假设它有两个独立的组件XY Both of these components are bipartite. 这两个组件都是二分的。

bipartite.sets(B) is supposed to determine which sets are the two partitions of B . bipartite.sets(B)应该确定哪些集合是B的两个分区。 But it's going to run into trouble. 但它会遇到麻烦。

Why? 为什么?

X can be broken into two partitions X_1 and X_2 and Y can be broken into Y_1 and Y_2 . X可以分为两个分区X_1X_2Y可以分为Y_1Y_2 What about B ? B怎么样? Let top = X_1 + Y_1 and bottom = X_2 + Y_2 . top = X_1 + Y_1bottom = X_2 + Y_2 This is a perfectly legitimate partition. 这是一个完全合法的分区。 But top = X_1+Y_2 and bottom = X_2+Y_1 is also a perfectly legitimate partition. top = X_1+Y_2bottom = X_2+Y_1也是完全合法的分区。 Which one should it return? 应该归还哪一个? It's ambiguous. 这是模棱两可的。 The algorithm explicitly refuses to make a choice. 该算法明确拒绝做出选择。 Instead it gives you an error. 相反,它会给你一个错误。

What to do? 该怎么办? You could throw out B if it's disconnected and try again. 如果断开连接,你可以扔掉B ,然后再试一次。 But you're using B for something right? 但是你正在使用B做对吗? Is it reasonable to restrict your attention only to the connected graphs? 将注意力仅限于连接的图表是否合理? Maybe, maybe not. 也许,也许不是。 That's something you need to figure out. 这是你需要弄清楚的事情。 But it is not reasonable to restrict your attention only to the connected graphs if the reason is that disconnected graphs are inconvenient. 但是,如果原因是断开连接的图形不方便,那么将注意力仅限于连接的图形是不合理的。 You seem to hit this error more often than not, so a large fraction of the graphs are disconnected --- you're throwing out a large fraction of the cases. 你似乎经常会遇到这个错误,所以很大一部分图表都是断开连接的 - 你扔掉了很大一部分案例。 It seems that this is likely to bias the final outcome of whatever you're doing. 看来这很可能会影响你所做的一切的最终结果。 (similarly, if you take steps to connect your network, you're no longer getting random graphs from the original distribution because well, you've ensured they aren't disconnected, and even worse now - you may not be uniformly sampling from the connected graphs). (同样地,如果你采取措施连接你的网络,你就不再从原始发行版获得随机图表,因为你已经确保它们没有断开连接,现在更糟糕的是 - 你可能不会从连通图)。

So what's a better option? 那么什么是更好的选择呢? After looking at the source code, I found that this method isn't documented as well as it should be. 在查看源代码之后,我发现这个方法没有记录,应该是这样。 It turns out that that for 事实证明,那是为了

B = bipartite.gnmk_random_graph(5,6,10)

nodes 0 up to 4 (the first five) are in the top, and nodes 5 up to 10 (the next 6) are in the bottom. 节点04 (前五个)位于顶部,节点510 (接下来的6个)位于底部。

Alternately you can get it directly from the data that is encoded in the graph B (not mentioned in the documentation). 或者,您可以直接从图B编码的数据中获取它(文档中未提及)。 Try 尝试

B = bipartite.gnmk_random_graph(5,6,10)
B.nodes(data=True)
> NodeDataView({0: {'bipartite': 0}, 1: {'bipartite': 0}, 2: {'bipartite': 0}, 3: {'bipartite': 0}, 4: {'bipartite': 0}, 5: {'bipartite': 1}, 6: {'bipartite': 1}, 7: {'bipartite': 1}, 8: {'bipartite': 1}, 9: {'bipartite': 1}, 10: {'bipartite': 1}})

So it's actually storing which node is in which part. 所以它实际上存储哪个节点在哪个部分。 Let's use that (and a list comprehension) 让我们使用它(和列表理解)

top = [node for node in B.nodes() if B.node[node]['bipartite']==0]
bottom = [node for node in B.nodes() if B.node[node]['bipartite']==1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM