[英]igraph in R: efficiently count number of edges between multiple sets of vertices
我想计算给定图的边数矩阵,并将该图划分为组。 我目前的解决方案不适用于大图,我想知道是否可以加快计算速度。
我想为此使用igraph
R 包,因此对于图G
和两组顶点set1
, set2
我目前使用 igraph 的%->%
运算符计算从一组到另一组的边数。
el <- E(G)[set1 %->% set2]
length(el)
我想知道是否有更快的方法可以在igraph
中本地或通过自制一些解决方案(可能使用Rcpp
)来做到这一点?
library(igraph)
# Set up toy graph and partition into two blocks
G <- make_full_graph(20, directed=TRUE)
m <- c(rep(1,10), rep(2,10))
block_edge_counts <- function(G, m){
# Get list of vertices per group.
c <- make_clusters(G, m, modularity = FALSE)
# Calculate matrix of edge counts between blocks.
E <- sapply(seq_along(c), function(r){
sapply(seq_along(c), function(s){
# Iterate over all block pairs
el <- E(G)[c[[r]] %->% c[[s]]] # list of edges from block r to block s
length(el) # get number of edges
})
})
}
block_edge_counts(G, m)
#> [,1] [,2]
#> [1,] 90 100
#> [2,] 100 90
#> install.packages("bench")
results <- bench::press(
Nsize = c(10,100,1000),
{
G <- make_full_graph(Nsize, directed=TRUE)
m <- c(rep(1,.5*Nsize), rep(2,.5*Nsize))
bench::mark(block_edge_counts(G,m))
}
)
#> Running with:
#> Nsize
#> 1 10
#> 2 100
#> 3 1000
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
results
#> # A tibble: 3 × 7
#> expression Nsize min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 block_edge_counts(G, m) 10 1.24ms 1.31ms 752. 39.97KB 12.6
#> 2 block_edge_counts(G, m) 100 3.24ms 3.37ms 284. 4.11MB 32.1
#> 3 block_edge_counts(G, m) 1000 262.73ms 323.61ms 3.29 408.35MB 34.2
通过提取图形的邻接矩阵并直接使用它,您可以获得明显更好的结果。
block_edge_counts_adj <- function(G, m) {
# Get list of vertices per group.
c <- make_clusters(G, m, modularity = FALSE)
am <- as_adjacency_matrix(G, sparse=F)
# Calculate matrix of edge counts between blocks.
sapply(seq_along(c), function(r){
sapply(seq_along(c), function(s){
# Iterate over all block pairs
sum(am[c[[r]], c[[s]]]) # number of edges from block r to block s
})
})
}
as_adjacency_matrix
的sparse=T
参数在这里是必不可少的,因为没有它,函数返回一个稀疏矩阵,其计算需要更长的时间。 也许对于稀疏图,这在内存使用方面是有益的,但在像您的示例中那样的完整图上,它会导致更长的计算。
block_edge_counts_adj2 <- function(G, m) { # using sparse matrix
# Get list of vertices per group.
c <- make_clusters(G, m, modularity = FALSE)
am <- as_adjacency_matrix(G)
# Calculate matrix of edge counts between blocks.
sapply(seq_along(c), function(r){
sapply(seq_along(c), function(s){
# Iterate over all block pairs
sum(am[c[[r]], c[[s]]]) # number of edges from block r to block s
})
})
}
results <- bench::press(
Nsize = c(10, 100, 1000, 3000),
{
G <- make_full_graph(Nsize, directed=TRUE)
m <- c(rep(1, .5*Nsize), rep(2, .5*Nsize))
bench::mark(block_edge_counts(G, m),
block_edge_counts_adj(G, m),
block_edge_counts_adj2(G, m),
min_iterations=5)
}
)
results
# A tibble: 12 x 14
# expression Nsize min median `itr/sec` mem_alloc `gc/sec` n_itr
# <bch:expr> <dbl> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int>
# 1 block_edge_counts(G, m) 10 2.44ms 2.88ms 301. 39.97KB 2.06 146
# 2 block_edge_counts_adj(G, m) 10 1.43ms 1.61ms 560. 1.8KB 2.05 273
# 3 block_edge_counts_adj2(G, m) 10 3.41ms 3.94ms 233. 14.46KB 2.05 114
# 4 block_edge_counts(G, m) 100 6.26ms 7.2ms 135. 4.11MB 0 68
# 5 block_edge_counts_adj(G, m) 100 2.26ms 2.46ms 376. 208.59KB 2.05 183
# 6 block_edge_counts_adj2(G, m) 100 4.82ms 5.32ms 181. 1.34MB 0 91
# 7 block_edge_counts(G, m) 1000 380.86ms 412.24ms 2.43 408.35MB 3.64 2
# 8 block_edge_counts_adj(G, m) 1000 25.85ms 27.46ms 36.0 15.71MB 2.25 16
# 9 block_edge_counts_adj2(G, m) 1000 114.91ms 133.71ms 7.71 130.09MB 1.93 4
# 10 block_edge_counts(G, m) 3000 3.78s 3.88s 0.260 3.59GB 1.92 5
# 11 block_edge_counts_adj(G, m) 3000 197.01ms 218.48ms 4.47 138.75MB 0.894 5
# 12 block_edge_counts_adj2(G, m) 3000 1.19s 1.25s 0.809 1.14GB 2.10 5
一种方法是将这两个集合收缩为单个顶点,然后计算它们之间的边。
收缩这两个集合,第一个获得顶点 id 1,第二个获得 2:
CG <- contract(G, m)
现在计算1 -> 2
和2 -> 1
边:
> count_multiple(CG, get.edge.ids(CG, c(1,2, 2,1)))
[1] 100 100
如果您对图进行了完整分区,则可以使用以下方法计算分区内边的分数
modularity(G, m, resolution = 0)
您的示例将完全划分为两组,因此您可以获得这两组之间的边数
> ecount(G)*(1 - modularity(G, m, resolution = 0))
[1] 200
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.