简体   繁体   English

R中的igraph:有效计算多组顶点之间的边数

[英]igraph in R: efficiently count number of edges between multiple sets of vertices

I want to calculate a matrix of edge counts for a given graph and partition of this graph into groups.我想计算给定图的边数矩阵,并将该图划分为组。 The solution I have at the moment does not scale for large graphs and I wonder if it is possible to speed up the computation.我目前的解决方案不适用于大图,我想知道是否可以加快计算速度。

I want to use the igraph R-package for this, so for a graph G and two sets of vertices set1 , set2 I currently calculate the number of edges from one set to the other by using igraph's %->% operator.我想为此使用igraph R 包,因此对于图G和两组顶点set1set2我目前使用 igraph 的%->%运算符计算从一组到另一组的边数。

el <- E(G)[set1 %->% set2]
length(el)

I wonder if there is a faster way to do this either natively in igraph or by homebrewing some solution (maybe using Rcpp )?我想知道是否有更快的方法可以在igraph中本地或通过自制一些解决方案(可能使用Rcpp )来做到这一点?

Example code示例代码

library(igraph)

# Set up toy graph and partition into two blocks
G <- make_full_graph(20, directed=TRUE)
m <- c(rep(1,10), rep(2,10))

block_edge_counts <- function(G, m){
  # Get list of vertices per group.
  c <- make_clusters(G, m, modularity = FALSE)
  # Calculate matrix of edge counts between blocks.
  E <- sapply(seq_along(c), function(r){
    sapply(seq_along(c), function(s){
      # Iterate over all block pairs
      el <- E(G)[c[[r]] %->% c[[s]]] # list of edges from block r to block s
      length(el) # get number of edges
    })
  })
}

block_edge_counts(G, m)
#>      [,1] [,2]
#> [1,]   90  100
#> [2,]  100   90

Example benchmark示例基准

#> install.packages("bench")

results <- bench::press(
  Nsize = c(10,100,1000),
  {
    G <- make_full_graph(Nsize, directed=TRUE)
    m <- c(rep(1,.5*Nsize), rep(2,.5*Nsize))
    bench::mark(block_edge_counts(G,m))
  }
)
#> Running with:
#>   Nsize
#> 1    10
#> 2   100
#> 3  1000
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
results
#> # A tibble: 3 × 7
#>   expression              Nsize      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>              <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 block_edge_counts(G, m)    10   1.24ms   1.31ms    752.     39.97KB     12.6
#> 2 block_edge_counts(G, m)   100   3.24ms   3.37ms    284.      4.11MB     32.1
#> 3 block_edge_counts(G, m)  1000 262.73ms 323.61ms      3.29  408.35MB     34.2

You can get significantly better results by extracting the graph's adjacency matrix and working with it directly.通过提取图形的邻接矩阵并直接使用它,您可以获得明显更好的结果。

block_edge_counts_adj <- function(G, m) {
  # Get list of vertices per group.
  c <- make_clusters(G, m, modularity = FALSE)
  am <- as_adjacency_matrix(G, sparse=F)
  # Calculate matrix of edge counts between blocks.
  sapply(seq_along(c), function(r){
    sapply(seq_along(c), function(s){
      # Iterate over all block pairs
      sum(am[c[[r]], c[[s]]]) # number of edges from block r to block s
    })
  })
}

The sparse=T argument to as_adjacency_matrix is essential here because without it the function returns a sparse matrix whose computation takes much longer. as_adjacency_matrixsparse=T参数在这里是必不可少的,因为没有它,函数返回一个稀疏矩阵,其计算需要更长的时间。 Maybe for a sparse graph this would be beneficial in terms of memory usage but on a full graph like the one in your example it leads to much longer computation.也许对于稀疏图,这在内存使用方面是有益的,但在像您的示例中那样的完整图上,它会导致更长的计算。

block_edge_counts_adj2 <- function(G, m) {  # using sparse matrix
  # Get list of vertices per group.
  c <- make_clusters(G, m, modularity = FALSE)
  am <- as_adjacency_matrix(G)
  # Calculate matrix of edge counts between blocks.
  sapply(seq_along(c), function(r){
    sapply(seq_along(c), function(s){
      # Iterate over all block pairs
      sum(am[c[[r]], c[[s]]]) # number of edges from block r to block s
    })
  })
}

results <- bench::press(
  Nsize = c(10, 100, 1000, 3000),
  {
    G <- make_full_graph(Nsize, directed=TRUE)
    m <- c(rep(1, .5*Nsize), rep(2, .5*Nsize))
    bench::mark(block_edge_counts(G, m),
                block_edge_counts_adj(G, m),
                block_edge_counts_adj2(G, m),
                min_iterations=5)
  }
)
results
# A tibble: 12 x 14
#    expression                   Nsize      min   median `itr/sec` mem_alloc `gc/sec` n_itr
#    <bch:expr>                   <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int>
#  1 block_edge_counts(G, m)         10   2.44ms   2.88ms   301.      39.97KB    2.06    146
#  2 block_edge_counts_adj(G, m)     10   1.43ms   1.61ms   560.        1.8KB    2.05    273
#  3 block_edge_counts_adj2(G, m)    10   3.41ms   3.94ms   233.      14.46KB    2.05    114
#  4 block_edge_counts(G, m)        100   6.26ms    7.2ms   135.       4.11MB    0        68
#  5 block_edge_counts_adj(G, m)    100   2.26ms   2.46ms   376.     208.59KB    2.05    183
#  6 block_edge_counts_adj2(G, m)   100   4.82ms   5.32ms   181.       1.34MB    0        91
#  7 block_edge_counts(G, m)       1000 380.86ms 412.24ms     2.43   408.35MB    3.64      2
#  8 block_edge_counts_adj(G, m)   1000  25.85ms  27.46ms    36.0     15.71MB    2.25     16
#  9 block_edge_counts_adj2(G, m)  1000 114.91ms 133.71ms     7.71   130.09MB    1.93      4
# 10 block_edge_counts(G, m)       3000    3.78s    3.88s     0.260    3.59GB    1.92      5
# 11 block_edge_counts_adj(G, m)   3000 197.01ms 218.48ms     4.47   138.75MB    0.894     5
# 12 block_edge_counts_adj2(G, m)  3000    1.19s    1.25s     0.809    1.14GB    2.10      5

One way is to contract the two sets into single vertices, then count edges between them.一种方法是将这两个集合收缩为单个顶点,然后计算它们之间的边。

Contract the two sets, obtaining vertex id 1 for the first and 2 for the second:收缩这两个集合,第一个获得顶点 id 1,第二个获得 2:

CG <- contract(G, m)

Now count 1 -> 2 and 2 -> 1 edges:现在计算1 -> 22 -> 1边:

> count_multiple(CG, get.edge.ids(CG, c(1,2, 2,1)))
[1] 100 100

If you have a full partitioning of the graph, then you can count the fraction of intra-partition edges using如果您对图进行了完整分区,则可以使用以下方法计算分区内边的分数

modularity(G, m, resolution = 0)

You example has a full partitioning into two sets, so you can get the number of edges between these two as您的示例将完全划分为两组,因此您可以获得这两组之间的边数

> ecount(G)*(1 - modularity(G, m, resolution = 0))
[1] 200

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM