在python igraph中選擇簇數和頂點數

Question

我有一個完整的加權圖，如下圖所示：

目標：我的目標是能夠使用 python 的 iGraph 實現選擇簇數和每個簇中的頂點數

到目前為止我嘗試過的：

import igraph
import cairo
import numpy as np

# Import data (see below, I've included this file)
graph2 = igraph.Graph.Read_Ncol('10_graph.ncol')

# Assigns weights to weights1
weights1 = graph2.es["weight"]

# Converts it to undirected graph
graph2.to_undirected()

# 'graph2.to_undirected()' strips the graph of its weights
# so we restore them to the "weight" attribute after
graph2.es["weight"] = weights1

# Reduces the number of significant figures in each edge label
graph2.es["label"] = np.around(weights1, 2)

# Label all the vertices
graph2.vs["label"] = range(1, 11)

# Things I've tried: (uncomment only one at a time)
# Both return non-clustered graphs.
#community = graph2.community_spinglass(weights1)
community = graph2.community_leading_eigenvector(weights=graph2.es["weight"], clusters=3)
igraph.plot(community)

如果運行上面的代碼，你會得到上面的圖像作為輸出。 對於我包含的兩種社區查找算法，您將獲得相同的圖像。 我已經注釋掉了其中一個，因此如果您想使用另一個，請繼續並取消注釋#community = graph2.community_spinglass(weights1) 。

問題：

看起來沒有一個圖按照我希望的方式進行聚類。
- 我傳遞weights=graph2.es["weight"] ，對應於圖中頂點的權重列表。
- 我還明確地將clusters=3傳遞給community_leading_eigenvector()
- 我仍然沒有根據這個圖的邊權重得到任何聚類。
- 如何通過顏色或位置繪制適當的集群，或者 iGraph 處理集群的區分？
我找不到任何關於如何選擇每個集群中的頂點數的官方文檔。
- 有沒有辦法（甚至是回旋處）來選擇每個集群中的頂點數？ 它不必是精確的，而是近似的。

10_graph.ncol

這是我為了形成圖形而導入的 .ncol 文件。

10_graph.ncol =

0 1 0.859412093436
0 2 0.696674188289
0 3 0.588339776278
0 4 0.5104097013
0 5 0.462457938906
0 6 0.427462387255
0 7 0.40350595007
0 8 0.382509071902
0 9 0.358689934558
1 2 0.912797848896
1 3 0.78532402562
1 4 0.681472223562
1 5 0.615574694967
1 6 0.567507619872
1 7 0.534715438785
1 8 0.506595029246
1 9 0.474297090248
2 3 0.941218154026
2 4 0.83850483835
2 5 0.759542327211
2 6 0.70025846718
2 7 0.659110815342
2 8 0.624313042633
2 9 0.584580479234
3 4 0.957468322138
3 5 0.886571688707
3 6 0.821838040975
3 7 0.772665012468
3 8 0.730820137423
3 9 0.684372167781
4 5 0.97372551117
4 6 0.92168855187
4 7 0.870589109091
4 8 0.823583870451
4 9 0.772154420843
5 6 0.98093419661
5 7 0.941236624882
5 8 0.895874086289
5 9 0.843755656833
6 7 0.985707938753
6 8 0.9523988462
6 9 0.906031710578
7 8 0.988193527182
7 9 0.955898136286
8 9 0.988293873257

Answer 1

這兩種方法都只返回一個集群。 這告訴我，您的頂點之間沒有明顯的分離：它們只是一個大纏結，因此沒有合理的方法將它們分開。

如果我編輯邊緣權重以獲得清晰的分離，就像下面的10_g2.ncol一樣，那么聚類算法會划分頂點。

起初，這並沒有產生我預期的群體。 我在頂點集 {0,1,2,3}、{4,5,6} 和 {7,8,9} 中放置了高權重，並在不同集之間放置了低權重。 但是 spinglass 將其拆分為 {0,1,2,5,6}、{3,4} 和 {7,8,9}，而 Leadership_eigenvector 將其拆分為 {0,1,2,5,6} 和 { 3,4,7,8,9}。

事實證明這是因為to_undirected()改變了邊的順序，所以當你在這個操作之后重新分配邊權重時，它會將它們與與之前不同的邊相關聯。 為避免這種情況，您應該指示to_undirected保留邊緣屬性，例如通過

graph2.to_undirected(combine_edges="max")

保留每個邊屬性的最大值（以防在同一頂點之間有多個有向邊），或

graph2.to_undirected(combine_edges="first")

只保留看到的第一個值。 （在這種情況下，該方法應該無關緊要，因為沒有多條邊。）

一旦您將圖形實際拆分為多個集群，默認plot方法將通過顏色區分它們。 您還可以使用community.subgraph(i)來獲取第 i^個集群的子圖並繪制它。

如何控制集群的數量？ 如您所知，leading_eigenvalue 方法有一個用於所需集群數量的clusters參數，但它顯然更像是一個指導方針而不是實際規則：給clusters=3導致只有 1 個集群包含您的數據，而 2 個集群包含我的數據。

您可以使用返回VertexDendrogram而不是 Clustering 的方法更精確地控制集群的數量，例如`community_edge_betweenness。

com3 = graph2.community_edge_betweenness(clusters=3, directed=False, weights="weight")

要獲得具有n集群的集群，您可以調用com3.as_clustering(n) ，它為我的所有測試提供了恰好n集群。

它們不一定是好的集群：

In [21]: print(com3.as_clustering(3))
Clustering with 10 elements and 3 clusters
[0] 0
[1] 1, 2, 3, 4, 5, 7, 8, 9
[2] 6

In [22]: print(com3.as_clustering(4))
Clustering with 10 elements and 4 clusters
[0] 0
[1] 1, 2, 3, 4, 5, 8, 9
[2] 6
[3] 7

In [23]: print(com3.as_clustering(5))
Clustering with 10 elements and 5 clusters
[0] 0
[1] 1, 3, 5
[2] 2, 4, 8, 9
[3] 6
[4] 7

In [24]: print(com3.as_clustering(6))
Clustering with 10 elements and 6 clusters
[0] 0
[1] 1, 3, 5
[2] 2, 8, 9
[3] 4
[4] 6
[5] 7

其他返回 VertexDendrograms 的方法是community_walktrap和community_fastgreedy 。 對於這個特定的例子，IMO，它們似乎都表現得更好。

In [25]: com5 = graph2.community_walktrap(weights='weight')

In [26]: com6 = graph2.community_fastgreedy(weights='weight')

In [27]: print(com5.as_clustering(3))
Clustering with 10 elements and 3 clusters
[0] 0, 1, 2, 5, 6
[1] 3, 4
[2] 7, 8, 9

In [32]: print(com6.as_clustering(3))
Clustering with 10 elements and 3 clusters
[0] 0, 1, 2, 5, 6
[1] 3, 4
[2] 7, 8, 9

這是我使用的更多樣化的權重。

10_g2.ncol：

在python igraph中選擇簇數和頂點數

問題描述

1 個解決方案

解決方案1
0 2016-06-17 14:25:12

在python igraph中選擇簇數和頂點數

問題描述

1 個解決方案

解決方案1 0 2016-06-17 14:25:12

解決方案1
0 2016-06-17 14:25:12