简体   繁体   English

带有共享数据的多个ggplot2图

[英]Multiple ggplot2 graphs with shared Data

How do I make multiple plots of the same data but colored differently by different factors (columns) while recycling data? 在回收数据时,如何制作相同数据的多个图表,但不同因素(列)的颜色不同? Is this what gridExtra does differently than cowplot ? 这是什么gridExtracowplot不同之cowplot

Objective: My objective is to visually compare different results of clustering the same data efficiently. 目标:我的目标是在视觉上比较有效聚类相同数据的不同结果。 I currently believe the easiest way to compare 2-4 clustering algorithms visually is to have them plotted next to each other. 我目前认为,直观地比较2-4个聚类算法的最简单方法是将它们彼此相邻绘制。

Thus, how do I plot the same data side by side colored differently? 因此,如何以不同的颜色并排绘制相同的数据?

Challenge/Specifications: Performance is very important. 挑战/规格:性能非常重要。 I have roughly 30,000 graphs to make, each with 450 - 480 points. 我有大约30,000个图表,每个图表有450 - 480点。 It is critical that the data is "recycled." 数据被“回收”至关重要。

I am able to plot them side by side using packages cowplot and gridExtra . 我可以使用包cowplotgridExtra并排绘制它们。 I just started using gridExtra today but it seems to recycle data and is better than cowplot for my purposes. 我今天刚开始使用gridExtra ,但它似乎回收数据并且比我的目的更好。 Update: u/eipi10 demonstrated facet_wrap could work if I gathered the columns before plotting. 更新:如果我在绘图前收集了列,u / eipi10演示了facet_wrap可以工作。

Set up 设定

    #Packages
     library(ggplot2)
     library(cowplot)
     library(gridExtra)
     library(pryr) #memory profile

    #Data creation
      x.points  <- c(1, 1, 1, 3, 3, 3, 5, 5, 5)
      y.points  <- c(1, 3, 5, 1, 3, 5, 1, 3, 5)
      cl_vert   <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
      cl_hoz    <- c("A", "B", "C", "A", "B", "C", "A", "B", "C")
      cl_cent   <- c("A","A","A","A", "B", "A","A","A","A")
    df <- data.frame(x.points, y.points, cl_vert, cl_hoz, cl_cent)

Graphing them 绘制图形

    #Graph function and individual plots
     graph <- function(data = df, Title = "", color.by, legend.position = "none"){
       ggplot(data, aes(x = `x.points`, y = `y.points`)) +
         geom_point(aes(color = as.factor(color.by))) + scale_color_brewer(palette = "Set1") + 
         labs(subtitle = Title, x = "log(X)", y = "log(Y)", color = "Color" ) + 
         theme_bw() + theme(legend.position = legend.position)  
     }

     g1 <- graph(Title = "Vertical", color.by = cl_vert)
     g2 <- graph(Title = "Horizontal", color.by = cl_hoz)
     g3 <- graph(Title = "Center", color.by = cl_cent)

    #Cowplot
     legend <- get_legend(graph(color.by = cl_vert, legend.position = "right")) #Not a memory waste
     plot <- plot_grid(g1, g2, g3, labels = c("A", "B", "C"))
     title <- ggdraw() + draw_label(paste0("Data Ex ", "1"), fontface = 'bold') 
     plot2 <- plot_grid(title, plot, ncol=1, rel_heights=c(0.1, 1)) # rel_heights values control title margins
     plot3 <- plot_grid(plot2, legend, rel_widths = c(1, 0.3))
     plot3

    #gridExtra
     plot_grid.ex <- grid.arrange(g1, g2, g3, ncol = 2, top = paste0("Data Ex ", "1"))
     plot_grid.ex

Memory usage with pryr 与pryr一起使用内存

    #Comparison
     object_size(plot_grid) #315 kB 
     object_size(plot3) #1.45 MB
    #Individual objects
     object_size(g1) #756 kB
     object_size(g2) #756 kB
     object_size(g3) #756 kB
     object_size(g1, g2, g3) #888 kB
     object_size(legend) #43.6 kB

Additional Questions: After writing this question and providing sample data, I just remembered gridExtra , tried it, and it seems to take up less memory than the combined data of its component graphs. 其他问题:在写完这个问题并提供样本数据之后,我只记得gridExtra ,尝试过它,它似乎占用的内存少于其组件图的组合数据。 I thought g1, g2, and g3 shared the same data except for the coloring assignment, which was why there was roughly 130 kB difference between the individual components and the total object size. 我认为除了着色分配外,g1,g2和g3共享相同的数据,这就是为什么各个组件和总对象大小之间存在大约130 kB的差异。 How is it that plot_grid takes up even less space than that? plot_grid如何占用甚至更少的空间? ls.str(plot_grid) doesn't seem to show any consolidation of g1, g2, and g3. ls.str(plot_grid)似乎没有显示g1,g2和g3的任何合并。 Would my best bet be to use lineprof() and run line by line comparisons? 我最好的选择是使用lineprof()并逐行进行比较吗?

Sources I've skimmed/read/consulted: 来源我已经浏览/阅读/咨询过:

Please bear with me as I am a new programmer (just truly started scripting December); 请耐心等待我,因为我是一名新程序员(刚刚开始编写脚本12月); I don't understand all the technical details yet but I want to. 我还不了解所有技术细节,但我想。

Faceting will work here if you convert your data to long format. 如果您将数据转换为长格式,则Faceting将在此处运行。 Here's an example: 这是一个例子:

library(tidyverse)

df %>% gather(method, cluster, cl_vert:cl_cent) %>% 
  ggplot(aes(x = x.points, y = y.points)) + 
    geom_point(aes(color = cluster)) + 
    scale_color_brewer(palette = "Set1") + 
    theme_bw() +
    facet_wrap(~ method)

在此输入图像描述

If you're after a boost in performance don't use any of those packages, including ggplot2. 如果你的性能提升,请不要使用任何这些软件包,包括ggplot2。 gridExtra, cowplot and others will always make things slower, and they do not "recycle" data in any sense (it's not clear what you mean by this). gridExtra,cowplot和其他人总会让事情变得更慢,他们不会在任何意义上“回收”数据(目前还不清楚你的意思)。

I would recommend doing all the time-consuming data processing outside ggplot2, and drawing the results that are already much closer to the final mapping (ie groups of colours are already assigned, etc.). 我建议在ggplot2之外进行所有耗时的数据处理,并绘制已经更接近最终映射的结果(即已分配颜色组等)。 You may find that ggplot2 then becomes overkill and slow for your application (lattice is typically faster, and so is base plot). 您可能会发现ggplot2对您的应用程序来说变得过度和缓慢(格子通常更快,基本情节也是如此)。

If you actually want shared data , I would think something like d3.js might get you closest to this goal, although it would simply leave the data duplication for the browser to do at rendering time. 如果你真的想要共享数据 ,我会认为像d3.js这样的东西可能会让你最接近这个目标,尽管它只会让浏览器的数据重复在渲染时完成。 Once the data points are rendered on screen, they have to be independent and duplicated, so the question is where in the pipeline is this most convenient and efficient to do. 一旦数据点在屏幕上呈现,它们必须是独立的并且是重复的,所以问题在于管道中最方便和最有效的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM