简体   繁体   English

包装R的绘图函数(或ggplot2)以防止绘制大型数据集

[英]Wrapping R's plot function (or ggplot2) to prevent plotting of large data sets

Rather than ask how to plot big data sets, I want to wrap plot so that code that produces a lot of plots doesn't get hammered when it is plotting a large object. 我不想问如何绘制大数据集,而是要包装plot以便生成大量图的代码在绘制大对象时不会受到影响。 How can I wrap plot with a very simple manner so that all of its functionality is preserved, but first tests to determine whether or not the object being passed is too large? 如何以一种非常简单的方式包装plot ,以便保留其所有功能,但是首先进行测试以确定要传递的对象是否太大?

This code works for very vanilla calls to plot , but it's missing the same generality as plot (see below). 该代码适用于对plot非常原始的调用,但是缺少与plot相同的通用性(请参见下文)。

myPlot <- function(x, ...){
    isBad <- any( (length(x) > 10^6) || (object.size(x) > 8*10^6) || (nrow(x) > 10^6) )
    if(is.na(isBad)){isBad = FALSE}
    if(isBad){
        stop("No plots for you!")
    }
    return(plot(x, ...))
}

x = rnorm(1000)
x = rnorm(10^6 + 1)

myPlot(x)

An example where this fails: 一个失败的例子:

x = rnorm(1000)
y = rnorm(1000)
plot(y ~ x)
myPlot(y ~ x)

Is there some easy way to wrap plot to enable this checking of the data to be plotted, while still passing through all of the arguments? 有没有一些简单的方法可以包装plot以使要检查的数据能够绘制,同时仍然通过所有参数? If not, then how about ggplot2 ? 如果没有,那么ggplot2呢? I'm an equal opportunity non-plotter. 我是一个没有机会的平等机会的人。 (In the cases where the dataset is large, I will use hexbin, sub-sampling, density plots, etc., but that's not the focus here.) (在数据集很大的情况下,我将使用hexbin,子采样,密度图等,但这不是这里的重点。)


Note 1: When testing ideas, I recommend testing for size > 100 (or set a variable, eg myThreshold <- 1000 ), rather than versus a size of > 1M - otherwise there will be a lot of pain in hitting the slow plotting. 注意1:在测试构想时,我建议测试尺寸> 100(或设置变量,例如myThreshold <- 1000 ),而不是对尺寸> 1M进行测试-否则, myThreshold <- 1000缓慢绘图时会遇到很多myThreshold <- 1000 :) :)

The problem you have is that as currently coded, myplot() assumes x is a data object, but then you try to pass it a formula. 您遇到的问题是,按照当前的编码, myplot()假定x是数据对象,但是随后您尝试将其传递给公式。 R's plot() achieves this via methods - when x is a formula, the plot.formula() method gets dispatched to instead of the basic plot.default() method. R的plot()通过方法实现-当x是一个公式时, plot.formula()方法将分派到而不是基本plot.default()方法。

You need to do the same: 您需要这样做:

myplot <- function(x, ...)
    UseMethod("myplot")

myplot.default <- function(x, ....) {
    isBad <- any((length(x) > 10^6) || (object.size(x) > 8*10^6) || 
                    (nrow(x) > 10^6))
    if(is.na(isBad)){isBad = FALSE}
    if(isBad){
        stop("No plots for you!")
    }
    invisible(plot(x, ...))
}

myplot.formula <- function(x, ...) {
    ## code here to process the formula into a data object for plotting
    ....
    myplot.default(processed_x, ...)
}

You can steal code from plot.formula() to use in the code needed to process x into an object. 您可以从plot.formula()窃取代码以用于将x处理为对象所需的代码。 Alternatively, you can roll your own following the standard non-standard evaluation rules (PDF) . 或者,您可以遵循标准非标准评估规则(PDF)自行滚动。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM