简体   繁体   English

在 R 中有效地绘制数百万个数据点

[英]Efficiently plotting millions of data points in R

I'm trying to plot some million data points in R. I'm currently using ggplot2 (but I'm open to suggestions of alternate packages).我正在尝试在 R 中绘制数百万个数据点。我目前正在使用 ggplot2(但我愿意接受替代包的建议)。 The problem is that the graph takes too long to render (often upwards of a minute).问题是图形渲染时间太长(通常超过一分钟)。 I'm looking for ways to do this faster -- in real time ideally.我正在寻找更快地做到这一点的方法 - 理想情况下是实时的。 I would appreciate any help -- attaching code to the question for clarity.我将不胜感激 - 为清楚起见,将代码附加到问题中。

Creating a (random) data frame with ~500000 data points:创建一个包含约 500000 个数据点的(随机)数据框:

letters <- c("A", "B", "C", "D", "E", "F", "G")
myLetters <- sample(x = letters, size = 100000, replace = T)
direction <- c("x", "y", "z")
factor1 <- sample(x = direction, size = 100000, replace = T)
factor2 <- runif(100000, 0, 20)
factor3 <- runif(100000, 0, 100)
decile <- sample(x = 1:10, size = 100000, replace = T)


new.plot.df <- data.frame(letters = myLetters, factor1 = factor1, factor2 = factor2, 
                      factor3 = factor3, decile = decile)

Now, plotting the data:现在,绘制数据:

color.plot <- ggplot(new.plot.df, aes(x = factor3, y = factor2, color = factor1)) +
geom_point(aes(alpha = factor2)) +
facet_grid(decile ~ letters)

在此处输入图片说明

How do I make the rendering faster?如何使渲染速度更快?

There are two main sources of slowness in R plotting: R 绘图缓慢有两个主要来源:

  1. graphics device and backend in general图形设备和后端一般
  2. plotting too much of complicated shapes绘制太多复杂的形状

Graphical back-end can be altered using appropriate device-opening and backend-selection commands -- for me, this usually helps:可以使用适当的设备打开和后端选择命令来更改图形后端——对我来说,这通常有帮助:

options(bitmapType='cairo')  #set the drawing backend, this may speed up PNG rendering
x11(type='cairo')   #drawing to X11 window using cairo is the fastest interactive output for me

(X11 is not available on windows and a little confusing in Rstudio, but that's a different story) (X11 在 Windows 上不可用,在 Rstudio 中有点混乱,但这是另一回事)

Plotting simpler shapes helps quite a lot.绘制更简单的形状有很大帮助。 ggplot uses some variant of pch=19 or pch=20 by default, which are way too slow because of anti-aliasing. ggplot 默认使用pch=19pch=20的一些变体,由于抗锯齿,这太慢了。 You can usually get about 10x faster rendering by using pch='.'使用pch='.'通常可以获得大约 10 倍的渲染速度pch='.' (which is just a single non-aliased pixel) or pch=16 (which is a small non-aliased circle). (这只是一个非锯齿像素)或pch=16 (这是一个小的非锯齿圆)。 That also applies for ggplot with shape='.'这也适用于shape='.' ggplot shape='.' and shape=16 , respectively.shape=16 ,分别。 If you have a lot of points and set appropriately lower alpha, you'll get the "anti-aliasing" for free.如果您有很多点并适当地设置较低的 alpha,您将免费获得“抗锯齿”。

For me, just switching the graphical backend and setting different point shape improved drawing of 1 million points from around 30 minutes to seconds.对我来说,只需切换图形后端并设置不同的点形状即可将 100 万个点的绘制从大约 30 分钟缩短到几秒钟。 500k data points should be rendered in under a second. 50 万个数据点应该在一秒钟内呈现。

EDIT (Jan 2020): I recently made a library that speeds this up even more: https://github.com/exaexa/scattermore编辑(2020 年 1 月):我最近制作了一个可以加快速度的库: https : //github.com/exaexa/scattermore

In general there are two strategies that I use for this:一般来说,我使用两种策略:

1) As described in the comments, taking a reasonable descriptive sample of your data is not going to affect your plot and you will reduce the number of points to render. 1)如评论中所述,对您的数据进行合理的描述性样本不会影响您的绘图,您将减少要渲染的点数。

2) One trick that I use is actually to create the object without displaying the plot and instead save the plot into a PNG image. 2)我使用的一个技巧实际上是在不显示绘图的情况下创建对象,而是将绘图保存到 PNG 图像中。 This actually speeds up the process by a lot because when you open the image it's going to be a raster rather than a vectorial image.这实际上大大加快了过程,因为当您打开图像时,它将是一个光栅而不是矢量图像。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM