简体   繁体   English

如何在 R 中可视化大型网络?

[英]How to visualize a large network in R?

Network visualizations become common in science in practice.在实践中,网络可视化在科学中变得很普遍。 But as networks are increasing in size, common visualizations become less useful.但是随着网络规模的扩大,常见的可视化变得不那么有用了。 There are simply too many nodes/vertices and links/edges.节点/顶点和链接/边太多了。 Often visualization efforts end up in producing "hairballs".通常,可视化工作最终会产生“毛球”。

Some new approaches have been proposed to overcome this issue, eg:已经提出了一些新的方法来克服这个问题,例如:

I am sure that there are many more approaches.我相信还有更多的方法。 Thus, my question is: How to overcome the hairball issue, ie how to visualize large networks by using R?因此,我的问题是:如何克服毛球问题,即如何使用 R 可视化大型网络?

Here is some code that simulates an exemplary network:下面是一些模拟示例网络的代码:

# Load packages
lapply(c("devtools", "sna", "intergraph", "igraph", "network"), install.packages)
library(devtools)
devtools::install_github(repo="ggally", username="ggobi")
lapply(c("sna", "intergraph", "GGally", "igraph", "network"), 
       require, character.only=T)

# Set up data
set.seed(123)
g <- barabasi.game(1000)

# Plot data
g.plot <- ggnet(g, mode = "fruchtermanreingold")
g.plot

在此处输入图像描述

This questions is related to Visualizing Undirected Graph That's Too Large for GraphViz?这个问题与Visualizing Undirected Graph That's Too Large for GraphViz 有关? . . However, here I am searching not for general software recommendations but for concrete examples (using the data provided above) which techniques help to make a good visualization of a large network by using R (comparable to the examples in this thread: R: Scatterplot with too many points ).但是,我在这里搜索的不是一般软件建议,而是具体示例(使用上面提供的数据),哪些技术有助于通过使用 R 对大型网络进行良好的可视化(与此线程中的示例相比: R:散点图点太多)。

Another way to visualize very large networks is with BioFabric (www.BioFabric.org), which uses horizontal lines instead of points to represent the nodes.另一种可视化超大型网络的方法是使用 BioFabric (www.BioFabric.org),它使用水平线而不是点来表示节点。 Edges are then shown using vertical line segments.然后使用垂直线段显示边缘。 A quick D3 demo of this technique is shown at:http://www.biofabric.org/gallery/pages/SuperQuickBioFabric.html .此技术的快速 D3 演示显示在:http ://www.biofabric.org/gallery/pages/SuperQuickBioFabric.html

BioFabric is a Java application, but a simple R version is available at: https://github.com/wjrl/RBioFabric . BioFabric 是一个 Java 应用程序,但可以从以下网址获得简单的 R 版本: https : //github.com/wjrl/RBioFabric

Here is a snippet of R code:下面是一段R代码:

 # You need 'devtools':
 install.packages("devtools")
 library(devtools)

 # you need igraph:
 install.packages("igraph")
 library(igraph)

 # install and load 'RBioFabric' from GitHub
 install_github('RBioFabric',  username='wjrl')
 library(RBioFabric)

 #
 # This is the example provided in the question:
 #

 set.seed(123)
 bfGraph = barabasi.game(1000)

 # This example has 1000 nodes, just like the provided example, but it 
 # adds 6 edges in each step, making for an interesting shape; play
 # around with different values.

 # bfGraph = barabasi.game(1000, m=6, directed=FALSE)

 # Plot it up! For best results, make the PDF in the same
 # aspect ratio as the network, though a little extra height
 # covers the top labels. Given the size of the network,
 # a PDF width of 100 gives us good resolution.

 height <- vcount(bfGraph)
 width <- ecount(bfGraph)
 aspect <- height / width;
 plotWidth <- 100.0
 plotHeight <- plotWidth * (aspect * 1.2)
 pdf("myBioFabricOutput.pdf", width=plotWidth, height=plotHeight)
 bioFabric(bfGraph)
 dev.off()

Here is a shot of the BioFabric version of the data provided by the questioner, though networks created with values of m > 1 are more interesting.这是提问者提供的 BioFabric 版本数据的截图,尽管使用 m > 1 的值创建的网络更有趣。 The inset detail shows a close-up of the upper left corner of the network;插图细节显示了网络左上角的特写; node BF4 is the highest-degree node in the network, and the default layout is a breadth-first search of the network (ignoring edge directions) starting from that node, with neighboring nodes traversed in order of decreasing node degree.节点 BF4 是网络中度数最高的节点,默认布局是从该节点开始对网络进行广度优先搜索(忽略边缘方向),相邻节点按节点度递减的顺序遍历。 Note that we can immediately see that, for example, about 60% of node BF4's neighbors are degree 1. We can also see from the strict 45-degree lower edge that this 1000-node network has 999 edges, and is therefore a tree.请注意,我们可以立即看到,例如,大约 60% 的节点 BF4 的邻居是度 1。我们还可以从严格的 45 度下边看到这个 1000 节点网络有 999 条边,因此是一棵树。

示例数据的 BioFabric 演示

Full disclosure: BioFabric is a tool that I wrote.全面披露:BioFabric 是我编写的一个工具。

That's an interesting question, I didn't know most of the tools you listed, so thanks.这是一个有趣的问题,我不知道你列出的大多数工具,所以谢谢。 You can add HivePlot to the list.您可以将HivePlot添加到列表中。 It's a deterministic method consisting in projecting nodes on a fixed number of axes (usually 2 or 3).这是一种确定性方法,包括将节点投影到固定数量的轴(通常为 2 或 3)上。 Look a the linked page, there're many visual examples.查看链接页面,有很多视觉示例。

在此处输入图片说明

It works better if you have a categorical nodal attribute in your dataset, so that you can use it to select which axis a node goes to.如果您的数据集中有分类节点属性,则效果会更好,以便您可以使用它来选择节点所在的轴。 For instance, when studying the social network of a university: students on one axis, teachers on another and administrative staff on the third.例如,在研究大学的社交网络时:一个轴是学生,另一个是教师,第三个是行政人员。 But of course, it can also work with a discretized numerical attribute (eg. young, middle-aged and older people on their respective axes).但当然,它也可以处理离散化的数字属性(例如,各自轴上的年轻人、中年人和老年人)。

Then you need another attribute, and it has to be numerical (or at least ordinal) this time.然后你需要另一个属性,这次它必须是数字(或至少是序数)。 It is used to determine the position of a node on its axis.它用于确定节点在其轴上的位置。 You can also use some topological measure, such as degree or transitivity (clustering coefficient).您还可以使用一些拓扑度量,例如度数或传递性(聚类系数)。

如何构建蜂巢图
(source: hiveplot.net ) (来源: hiveplot.net

The fact the method is deterministic is interesting, because it allows comparing different networks representing distinct (but comparable) systems.该方法是确定性的这一事实很有趣,因为它允许比较代表不同(但可比较)系统的不同网络。 For example, you can compare two universities (provided you use the same attributes/measures to determine axes and position).例如,您可以比较两所大学(前提是您使用相同的属性/度量来确定轴和位置)。 It also allows describing the same network in various ways, by choosing different combinations of attributes/measures to generate the visualization.它还允许通过选择不同的属性/度量组合来生成可视化,以各种方式描述同一网络。 This is the recommanded way of visualizing a network, actually, thanks to a so-called hive panel.这是可视化网络的推荐方式,实际上,这要归功于所谓的蜂巢面板。

Several softwares able of generating those hive plots are listed in the page I mentioned at the beginning of this post, including implementations in Java and R.我在本文开头提到的页面中列出了几种能够生成这些蜂巢图的软件,包括在 Java 和 R 中的实现。

I've been dealing with this problem recently.我最近一直在处理这个问题。 As a result, I've come up with another solution.结果,我想出了另一个解决方案。 Collapse the graph by communities/clusters.按社区/集群折叠图表。 This approach is similar to the third option outlined by the OP above.这种方法类似于上面 OP 概述的第三个选项。 As a word of warning, this approach will work best with undirected graphs.作为警告,这种方法最适合无向图。 For example:例如:

library(igraph)

set.seed(123)
g <- barabasi.game(1000) %>%
  as.undirected()

#Choose your favorite algorithm to find communities.  The algorithm below is great for large networks but only works with undirected graphs
c_g <- fastgreedy.community(g)

#Collapse the graph by communities.  This insight is due to this post http://stackoverflow.com/questions/35000554/collapsing-graph-by-clusters-in-igraph/35000823#35000823

res_g <- simplify(contract(g, membership(c_g))) 

The result of this process is the below figure, where the vertices' names represent community membership.此过程的结果是下图,其中顶点的名称代表社区成员资格。

plot(g, margin = -.5)

在此处输入图片说明

The above is clearly nicer than this hideous mess以上显然比这个可怕的混乱更好

plot(r_g, margin = -.5)

在此处输入图片说明

To link communities to original vertices you will need something akin to the following要将社区链接到原始顶点,您需要类似于以下内容

mem <- data.frame(vertices = 1:vcount(g), memeber = as.numeric(membership(c_g)))

IMO this is a nice approach for two reasons. IMO 这是一个很好的方法,原因有两个。 First, it can in theory deal with any size graph.首先,它理论上可以处理任何大小的图。 The process of finding communities can be continuously repeated on collapsed graphs.寻找社区的过程可以在折叠图上不断重复。 Second, adopting a interactive approach would yield very readable results.其次,采用交互式方法会产生非常易读的结果。 For example, one can imagine the user being able to click on a vertex in the collapsed graph to expand that community revealing all of its original vertices.例如,可以想象用户能够单击折叠图中的一个顶点来扩展该社区,从而显示其所有原始顶点。

I have looked around and found no good solution.我环顾四周,没有找到好的解决方案。 My approach has been to remove nodes and play with edge transparency.我的方法是删除节点并使用边缘透明度。 It is more of a design solution rather than a technical one, but I've been able to plot gephi-like networks of up to 50,000 edges without much complications on my laptop.它更像是一种设计解决方案而不是技术解决方案,但我已经能够在我的笔记本电脑上绘制多达 50,000 个边缘的类似 gephi 的网络,而不会出现太多复杂情况。

with your example:用你的例子:

plot(simplify(g), vertex.size= 0.01,edge.arrow.size=0.001,vertex.label.cex = 0.75,vertex.label.color = "black"  ,vertex.frame.color = adjustcolor("white", alpha.f = 0),vertex.color = adjustcolor("white", alpha.f = 0),edge.color=adjustcolor(1, alpha.f = 0.15),display.isolates=FALSE,vertex.label=ifelse(page_rank(g)$vector > 0.1 , "important nodes", NA))

在此处输入图片说明

Example of twitter mentions network with 30,000 edges: twitter 的示例提到了具有 30,000 条边的网络:

在此处输入图片说明

Yet another interesting package is networkD3 .另一个有趣的包是networkD3 There are a myriad of means of representing graphs within this library.在这个库中有无数种表示图形的方法。 In particular, I find the forceNetwork an interesting option.特别是,我发现forceNetwork是一个有趣的选项。 It is interactive and therefore allows you to really explore your network.它是交互式的,因此可以让您真正探索您的网络。 It is great for EDA, but it maybe too "wiggly" for final work.这对 EDA 来说很棒,但对于最终工作来说可能太“摇摆不定”了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM