I have a complete weighted graph as you can see in the image below:
The Goal: My goal is to be able to choose the number of clusters and the number of vertices in each cluster using python's implementation of iGraph
What I've Tried So Far:
import igraph
import cairo
import numpy as np
# Import data (see below, I've included this file)
graph2 = igraph.Graph.Read_Ncol('10_graph.ncol')
# Assigns weights to weights1
weights1 = graph2.es["weight"]
# Converts it to undirected graph
graph2.to_undirected()
# 'graph2.to_undirected()' strips the graph of its weights
# so we restore them to the "weight" attribute after
graph2.es["weight"] = weights1
# Reduces the number of significant figures in each edge label
graph2.es["label"] = np.around(weights1, 2)
# Label all the vertices
graph2.vs["label"] = range(1, 11)
# Things I've tried: (uncomment only one at a time)
# Both return non-clustered graphs.
#community = graph2.community_spinglass(weights1)
community = graph2.community_leading_eigenvector(weights=graph2.es["weight"], clusters=3)
igraph.plot(community)
If the above code is run, you get as output the above image. You get the same image for both community-finding algorithms I've included. I've commented out one of them, so if you want to use the other one, go ahead and uncomment #community = graph2.community_spinglass(weights1)
.
The Problem(s):
weights=graph2.es["weight"]
, the list of weights corresponding the vertices in the graph.clusters=3
to community_leading_eigenvector()
10_graph.ncol
Here's the .ncol file I import to form the graph.
10_graph.ncol =
0 1 0.859412093436
0 2 0.696674188289
0 3 0.588339776278
0 4 0.5104097013
0 5 0.462457938906
0 6 0.427462387255
0 7 0.40350595007
0 8 0.382509071902
0 9 0.358689934558
1 2 0.912797848896
1 3 0.78532402562
1 4 0.681472223562
1 5 0.615574694967
1 6 0.567507619872
1 7 0.534715438785
1 8 0.506595029246
1 9 0.474297090248
2 3 0.941218154026
2 4 0.83850483835
2 5 0.759542327211
2 6 0.70025846718
2 7 0.659110815342
2 8 0.624313042633
2 9 0.584580479234
3 4 0.957468322138
3 5 0.886571688707
3 6 0.821838040975
3 7 0.772665012468
3 8 0.730820137423
3 9 0.684372167781
4 5 0.97372551117
4 6 0.92168855187
4 7 0.870589109091
4 8 0.823583870451
4 9 0.772154420843
5 6 0.98093419661
5 7 0.941236624882
5 8 0.895874086289
5 9 0.843755656833
6 7 0.985707938753
6 8 0.9523988462
6 9 0.906031710578
7 8 0.988193527182
7 9 0.955898136286
8 9 0.988293873257
Both methods are just returning a single cluster. This tells me that there's no clear separation between your vertices: they're just a big tangle, so there's no reasonable way to pull them apart.
If I edit the edge weights to have clear separations, like in 10_g2.ncol
below, then the clustering algorithms do divide the vertices.
At firs this did not produce the groups I expected. I put high weights within the vertex sets {0,1,2,3}, {4,5,6}, and {7,8,9}, and low weights between different sets. But spinglass splits it into {0,1,2,5,6}, {3,4}, and {7,8,9}, while leading_eigenvector splits it into {0,1,2,5,6} and {3,4,7,8,9}.
It turns out this is because to_undirected()
changes the order of the edges, so when you reassign the edge weights after this operation, it associates them with different edges than before. To avoid this, you should instruct to_undirected
to retain the edge attributes, eg by
graph2.to_undirected(combine_edges="max")
to retain the maximum value of each edge attribute (in case there are several directed edges between the same vertices), or
graph2.to_undirected(combine_edges="first")
to retain just the first value seen. (The method should be irrelevant in this case, since there are not multiple edges.)
Once you have actually split your graph into multiple clusters, the default plot
method will differentiate them by colors. You can also use community.subgraph(i)
to get the subgraph for the i th cluster and just draw that.
What about controlling the number of clusters? As you know, the leading_eigenvalue method has a clusters
parameter for the desired number of clusters, but it's apparently more a guideline than an actual rule: giving clusters=3
results in just 1 cluster with your data, and 2 clusters with mine.
You can get more precise control of the number of clusters with a method which returns a VertexDendrogram instead of a Clustering, such as `community_edge_betweenness.
com3 = graph2.community_edge_betweenness(clusters=3, directed=False, weights="weight")
To get a clustering with n
clusters, you call com3.as_clustering(n)
, which gave exactly n
clusters for all my tests.
They're not necessarily good clusters:
In [21]: print(com3.as_clustering(3))
Clustering with 10 elements and 3 clusters
[0] 0
[1] 1, 2, 3, 4, 5, 7, 8, 9
[2] 6
In [22]: print(com3.as_clustering(4))
Clustering with 10 elements and 4 clusters
[0] 0
[1] 1, 2, 3, 4, 5, 8, 9
[2] 6
[3] 7
In [23]: print(com3.as_clustering(5))
Clustering with 10 elements and 5 clusters
[0] 0
[1] 1, 3, 5
[2] 2, 4, 8, 9
[3] 6
[4] 7
In [24]: print(com3.as_clustering(6))
Clustering with 10 elements and 6 clusters
[0] 0
[1] 1, 3, 5
[2] 2, 8, 9
[3] 4
[4] 6
[5] 7
Other methods returning VertexDendrograms are community_walktrap
and community_fastgreedy
. They both seem to perform better for this particular example, IMO.
In [25]: com5 = graph2.community_walktrap(weights='weight')
In [26]: com6 = graph2.community_fastgreedy(weights='weight')
In [27]: print(com5.as_clustering(3))
Clustering with 10 elements and 3 clusters
[0] 0, 1, 2, 5, 6
[1] 3, 4
[2] 7, 8, 9
In [32]: print(com6.as_clustering(3))
Clustering with 10 elements and 3 clusters
[0] 0, 1, 2, 5, 6
[1] 3, 4
[2] 7, 8, 9
Here is the more variegated weighting I used.
10_g2.ncol:
0 1 0.91
0 2 0.92
0 3 0.93
0 4 0.04
0 5 0.05
0 6 0.06
0 7 0.07
0 8 0.08
0 9 0.09
1 2 0.94
1 3 0.95
1 4 0.14
1 5 0.15
1 6 0.16
1 7 0.17
1 8 0.18
1 9 0.19
2 3 0.96
2 4 0.01
2 5 0.02
2 6 0.03
2 7 0.04
2 8 0.05
2 9 0.06
3 4 0.01
3 5 0.01
3 6 0.01
3 7 0.01
3 8 0.01
3 9 0.01
4 5 0.97
4 6 0.92
4 7 0.05
4 8 0.04
4 9 0.08
5 6 0.98
5 7 0.12
5 8 0.08
5 9 0.08
6 7 0.07
6 8 0.06
6 9 0.06
7 8 0.98
7 9 0.95
8 9 0.98
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.