How to graph only top 50-100 connected component subgraphs in NetworkX; drawing multiple subgraphs at once

Question

Sorry if this is rough: it's my first post to Stackoverflow! I am sorry in advance for not posting code, but nothing I am doing is complex (and maybe that's the problem), so describing should work. I also apologize if I am bad at describing issues because I am new to Python; I am not really sure how to recreate an example without data already :(

When using NetworkX, I frequently run large, undirected graphs (let's call them G) with thousands of nodes after importing the data in from pandas. The VAST majority of nodes only have one or two edges, which are just noise to me. It's clusters with lots of nodes that interest me, and that's actually the minority.

So I will then run the nx.connected_components command to make a long list of all the subgraph sets contained with G, review the top results, and print the individual subgraphs that interest me one at a time.

As such, when I get my generator list/dictionary of all of the connected component subgraphs (which is typically very long), I will also generally just look at the first 50-100 results. Because these tend to have what I am looking for.

I tried nx.connected_component_subgraphs , but, there are so many I don't need that way that it's almost as bad as just visualizing the whole network at once.

So in short: how can I take the generator/list of sets that nx.connected_components gives me--which I then shorten to the top 50--and make that into a new graph?

I tried converting the output of nx.component_components to a list, but it is all sets.

No error messages.

Answer 1

One approach could be something like the following:

First find all components but the N largest ones

small_components = sorted(nx.connected_components(G), key=len)[:-N]

Then, remove from G all vertices belonging to one of these components:

G.remove_nodes_from(itertools.chain.from_iterable(small_components))

Here's an example where we keep only the two largest components of a given graph:

In [31]: G = nx.Graph()
In [32]: G.add_edges_from([(1, 2), (2, 3), (3, 4), (5, 6), (7, 8), (8, 9)])
In [33]: small_components = sorted(nx.connected_components(G), key=len)[:-2]
In [34]: small_components
Out[34]: [{5, 6}]
In [35]: G.remove_nodes_from(itertools.chain.from_iterable(small_components))
In [36]: G.nodes()
Out[36]: NodeView((1, 2, 3, 4, 7, 8, 9))

How to graph only top 50-100 connected component subgraphs in NetworkX; drawing multiple subgraphs at once

Question

1 answers

solution1
0 ACCPTED 2019-07-02 16:34:08

How to graph only top 50-100 connected component subgraphs in NetworkX; drawing multiple subgraphs at once

Question

1 answers

solution1 0 ACCPTED 2019-07-02 16:34:08

solution1
0 ACCPTED 2019-07-02 16:34:08