简体   繁体   中英

Computing graph efficiency metrics of a very large network

What are strategies for calculating the nodal and global efficiency of very large graphs in R?

I'm attempting to calculate the global efficiency of a very large igraph with brainGraph::efficiency(my_graph, type = "global") .

library(igraph); library(brainGraph)  
g <- readRDS("path_to_my_graph.rds")  

> ecount(g); vcount(g) # count of edges and vertices
[1] 715758
[1] 290190

It reliably crashes R every time. Global efficiency is the mean of all nodal efficiencies, so I've attempted to calculate it that way with no success. The weights on each edge of my graph are all 1, so I've omitted the weights, but R still crashes.

# all of these cause R to crash
efficiency(g, type = "global")
efficiency(g, type = "nodal")
efficiency(g, type = "global", weights = NA)
efficiency(g, type = "nodal",  weights = NA)

My graph (~37MB) is available here on GoogleDrive as an .rds file for those who want data to test.

R crashes because brainGraph::efficiency() attempts to calculate an enormous and dense distance matrix, which overwhelms the memory of my machine (32 GB). But I found a solution that chunks out the operation and runs in parallel.

Global efficiency is the mean of all nodal efficiencies in the graph. The nodal efficiency of a vertex i is:

在此输入图像描述

We can sequentially calculate the nodal efficiency for each vertex in the graph, which splits the distance matrix calculation into smaller manageable bits. Because the efficiency of each vertex is independent, we can parallelize the operation so it doesn't take forever.

library(igraph)  
library(doParallel)

# nodal efficiency function
get_eff <- function(i){return((1/(length(V(g)) - 1))*sum(1/distances(g, V(g)[i])[-i]))}

no_cores <- detectCores() - 1 
cl       <- makeCluster(no_cores)
registerDoParallel(cl)  

result <- foreach(i = seq_along(V(g)), .combine = c, .packages = "igraph") %dopar% get_eff(i)

stopCluster(cl)
rm(cl)

global_eff <- mean(result)

Moreover, we can plot the distribution of nodal efficiencies, as well as the global (mean) efficiency, which gives us a better understanding of the network.

library(ggplot2)
data.frame(x = result) %>% 
  ggplot(aes(x)) + 
  geom_histogram() + 
  geom_vline(xintercept = mean(result), col = "red") # global efficiency
  theme_minimal() +
  labs(title = "Nodal efficiences", x = "", y = "Count")

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM