简体   繁体   中英

Choose the number of clusters and vertices in python igraph

I have a complete weighted graph as you can see in the image below:

在此处输入图片说明

The Goal: My goal is to be able to choose the number of clusters and the number of vertices in each cluster using python's implementation of iGraph

What I've Tried So Far:

import igraph
import cairo
import numpy as np

# Import data (see below, I've included this file)
graph2 = igraph.Graph.Read_Ncol('10_graph.ncol')

# Assigns weights to weights1
weights1 = graph2.es["weight"]

# Converts it to undirected graph
graph2.to_undirected()

# 'graph2.to_undirected()' strips the graph of its weights
# so we restore them to the "weight" attribute after
graph2.es["weight"] = weights1

# Reduces the number of significant figures in each edge label
graph2.es["label"] = np.around(weights1, 2)

# Label all the vertices
graph2.vs["label"] = range(1, 11)

# Things I've tried: (uncomment only one at a time)
# Both return non-clustered graphs.
#community = graph2.community_spinglass(weights1)
community = graph2.community_leading_eigenvector(weights=graph2.es["weight"], clusters=3)
igraph.plot(community)

If the above code is run, you get as output the above image. You get the same image for both community-finding algorithms I've included. I've commented out one of them, so if you want to use the other one, go ahead and uncomment #community = graph2.community_spinglass(weights1) .

The Problem(s):

  • It looks like none of the graphs are being clustered the way I want them to.
    • I pass weights=graph2.es["weight"] , the list of weights corresponding the vertices in the graph.
    • I also explicitly pass clusters=3 to community_leading_eigenvector()
    • I am still not getting any clustering based on the edge weights of this graph.
    • How to draw proper clusters, either by color, or location, or however iGraph handles differentiation of clusters?
  • I am unable to find any official documentation about how to choose the number of vertices in each cluster.
    • Is there a way (even roundabout) to choose the number of vertices in each cluster? It doesn't have to be exact , but approximate.

10_graph.ncol

Here's the .ncol file I import to form the graph.

10_graph.ncol =

0 1 0.859412093436
0 2 0.696674188289
0 3 0.588339776278
0 4 0.5104097013
0 5 0.462457938906
0 6 0.427462387255
0 7 0.40350595007
0 8 0.382509071902
0 9 0.358689934558
1 2 0.912797848896
1 3 0.78532402562
1 4 0.681472223562
1 5 0.615574694967
1 6 0.567507619872
1 7 0.534715438785
1 8 0.506595029246
1 9 0.474297090248
2 3 0.941218154026
2 4 0.83850483835
2 5 0.759542327211
2 6 0.70025846718
2 7 0.659110815342
2 8 0.624313042633
2 9 0.584580479234
3 4 0.957468322138
3 5 0.886571688707
3 6 0.821838040975
3 7 0.772665012468
3 8 0.730820137423
3 9 0.684372167781
4 5 0.97372551117
4 6 0.92168855187
4 7 0.870589109091
4 8 0.823583870451
4 9 0.772154420843
5 6 0.98093419661
5 7 0.941236624882
5 8 0.895874086289
5 9 0.843755656833
6 7 0.985707938753
6 8 0.9523988462
6 9 0.906031710578
7 8 0.988193527182
7 9 0.955898136286
8 9 0.988293873257

Both methods are just returning a single cluster. This tells me that there's no clear separation between your vertices: they're just a big tangle, so there's no reasonable way to pull them apart.

If I edit the edge weights to have clear separations, like in 10_g2.ncol below, then the clustering algorithms do divide the vertices.

At firs this did not produce the groups I expected. I put high weights within the vertex sets {0,1,2,3}, {4,5,6}, and {7,8,9}, and low weights between different sets. But spinglass splits it into {0,1,2,5,6}, {3,4}, and {7,8,9}, while leading_eigenvector splits it into {0,1,2,5,6} and {3,4,7,8,9}.

It turns out this is because to_undirected() changes the order of the edges, so when you reassign the edge weights after this operation, it associates them with different edges than before. To avoid this, you should instruct to_undirected to retain the edge attributes, eg by

graph2.to_undirected(combine_edges="max")

to retain the maximum value of each edge attribute (in case there are several directed edges between the same vertices), or

graph2.to_undirected(combine_edges="first")

to retain just the first value seen. (The method should be irrelevant in this case, since there are not multiple edges.)

Once you have actually split your graph into multiple clusters, the default plot method will differentiate them by colors. You can also use community.subgraph(i) to get the subgraph for the i th cluster and just draw that.

What about controlling the number of clusters? As you know, the leading_eigenvalue method has a clusters parameter for the desired number of clusters, but it's apparently more a guideline than an actual rule: giving clusters=3 results in just 1 cluster with your data, and 2 clusters with mine.

You can get more precise control of the number of clusters with a method which returns a VertexDendrogram instead of a Clustering, such as `community_edge_betweenness.

com3 = graph2.community_edge_betweenness(clusters=3, directed=False, weights="weight")

To get a clustering with n clusters, you call com3.as_clustering(n) , which gave exactly n clusters for all my tests.

They're not necessarily good clusters:

In [21]: print(com3.as_clustering(3))
Clustering with 10 elements and 3 clusters
[0] 0
[1] 1, 2, 3, 4, 5, 7, 8, 9
[2] 6

In [22]: print(com3.as_clustering(4))
Clustering with 10 elements and 4 clusters
[0] 0
[1] 1, 2, 3, 4, 5, 8, 9
[2] 6
[3] 7

In [23]: print(com3.as_clustering(5))
Clustering with 10 elements and 5 clusters
[0] 0
[1] 1, 3, 5
[2] 2, 4, 8, 9
[3] 6
[4] 7

In [24]: print(com3.as_clustering(6))
Clustering with 10 elements and 6 clusters
[0] 0
[1] 1, 3, 5
[2] 2, 8, 9
[3] 4
[4] 6
[5] 7

Other methods returning VertexDendrograms are community_walktrap and community_fastgreedy . They both seem to perform better for this particular example, IMO.

In [25]: com5 = graph2.community_walktrap(weights='weight')

In [26]: com6 = graph2.community_fastgreedy(weights='weight')

In [27]: print(com5.as_clustering(3))
Clustering with 10 elements and 3 clusters
[0] 0, 1, 2, 5, 6
[1] 3, 4
[2] 7, 8, 9

In [32]: print(com6.as_clustering(3))
Clustering with 10 elements and 3 clusters
[0] 0, 1, 2, 5, 6
[1] 3, 4
[2] 7, 8, 9

Here is the more variegated weighting I used.

10_g2.ncol:

0 1 0.91
0 2 0.92
0 3 0.93
0 4 0.04
0 5 0.05
0 6 0.06
0 7 0.07
0 8 0.08
0 9 0.09
1 2 0.94
1 3 0.95
1 4 0.14
1 5 0.15
1 6 0.16
1 7 0.17
1 8 0.18
1 9 0.19
2 3 0.96
2 4 0.01
2 5 0.02
2 6 0.03
2 7 0.04
2 8 0.05
2 9 0.06
3 4 0.01
3 5 0.01
3 6 0.01
3 7 0.01
3 8 0.01
3 9 0.01
4 5 0.97
4 6 0.92
4 7 0.05
4 8 0.04
4 9 0.08
5 6 0.98
5 7 0.12
5 8 0.08
5 9 0.08
6 7 0.07
6 8 0.06
6 9 0.06
7 8 0.98
7 9 0.95
8 9 0.98

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM