简体   繁体   English

如何识别geom_smooth()使用的函数

[英]How to identify the function used by geom_smooth()

I would like to display a plot created by geom_smooth() but it is important for me to be able to describe how the plot was created. 我想显示由geom_smooth()创建的绘图,但是能够描述绘图是如何创建的对我来说很重要。

I can see from the documentation when n >= 1000, gam is used as the smoothing function, but I cannot see how many knots are used or what function generated the smoothing. 我可以从文档中看到当n> = 1000时,gam被用作平滑函数,但是我看不到使用了多少个结或者什么函数生成了平滑。

Example: 例:

library(ggplot2)

set.seed(12345)
n <- 3000
x1 <- seq(0, 4*pi,, n)
x2 <- runif(n)
x3 <- rnorm(n)
lp <- 2*sin(2* x1)+3*x2 + 3*x3
p <- 1/(1+exp(-lp))
y <- ifelse(p > 0.5, 1, 0)

df <- data.frame(x1, x2, x3, y)

# default plot
ggplot(df, aes(x = x1, y = y)) +
  geom_smooth() 

# specify method='gam'
# linear
ggplot(df, aes(x = x1, y = y)) +
  geom_smooth(method = 'gam') 

# specify gam and splines
# Shows non-linearity, but different from default
ggplot(df, aes(x = x1, y = y)) +
  geom_smooth(method = 'gam',
              method.args = list(family = "binomial"),
              formula = y ~ splines::ns(x, 7)) 

If I want to use the default parameters, is there a way to identify the function used to create the smoothing so I can accurately describe it in a methods section of the analysis? 如果我想使用默认参数,有没有办法识别用于创建平滑的函数,那么我可以在分析的方法部分准确描述它?

geom_smooth的变化

I wrote a function to reverse-engineer the steps used in StatSmooth 's setup_params function to get the actual method / formula parameters used for plotting. 我编写了一个函数来反向设计StatSmoothsetup_params函数中使用的步骤,以获得用于绘图的实际方法/公式参数。

The function expects a ggplot object as its input, with an additional optional parameter specifying the layer that corresponds to geom_smooth (defaults to 1 if unspecified). 该函数需要一个ggplot对象作为其输入,并附加一个可选参数,指定与geom_smooth对应的geom_smooth (如果未指定则默认为1)。 It returns a text string in the form "Method: [method used], Formula: [formula used]" , and also prints out all the parameters to console. 它返回"Method: [method used], Formula: [formula used]"形式的文本字符串,并打印出所有参数到控制台。

The envisaged use case is two-fold: 设想的用例有两个:

  1. Add the text string as-is to the plot as plot title / subtitle / caption, for quick reference during analysis; 按原样将文本字符串添加到绘图中作为标题/副标题/标题,以便在分析期间快速参考;
  2. Read off the console printout, & include the information elsewhere or manually format it nicely (eg parsed plotmath expressions) for annotation in the plot, for report / presentation. 读取控制台打印输出,并在其他地方包含信息,或者手动格式化(例如解析的plotmath表达式)以便绘图中的注释,用于报告/演示。

Function : 功能

get.params <- function(plot, layer = 1){

  # return empty string if the specified geom layer doesn't use stat = "smooth"
  if(!"StatSmooth" %in% class(plot$layers[[layer]]$stat)){
    message("No smoothing function was used in this geom layer.")
    return("")
  }

  # recreate data used by this layer, in the format expected by StatSmooth
  # (this code chunk takes heavy reference from ggplot2:::ggplot_build.ggplot)
  layer.data <- plot$layers[[layer]]$layer_data(plot$data)
  layout <- ggplot2:::create_layout(plot$facet, plot$coordinates)
  data <- layout$setup(list(layer.data), plot$data, plot$plot_env)
  data[[1]] <- plot$layers[[layer]]$compute_aesthetics(data[[1]], plot)
  scales <- plot$scales
  data[[1]] <- ggplot2:::scales_transform_df(scales = scales, df = data[[1]])
  layout$train_position(data, scales$get_scales("x"), scales$get_scales("y"))
  data <- layout$map_position(data)[[1]]

  # set up stat params (e.g. replace "auto" with actual method / formula)
  stat.params <- suppressMessages(
    plot$layers[[layer]]$stat$setup_params(data = data, 
                                           params = plot$layers[[layer]]$stat_params)
    )

  # reverse the last step in setup_params; we don't need the actual function
  # for mgcv::gam, just the name
  if(identical(stat.params$method, mgcv::gam)) stat.params$method <- "gam"

  print(stat.params)

  return(paste0("Method: ", stat.params$method, ", Formula: ", deparse(stat.params$formula)))
}

Demonstration : 示范

p <- ggplot(df, aes(x = x1, y = y)) # df is the sample dataset in the question

# default plot for 1000+ observations
# (method defaults to gam & formula to 'y ~ s(x, bs = "cs")')
p1 <- p + geom_smooth()
p1 + ggtitle(get.params(p1))

# specify method = 'gam'
# (formula defaults to `y ~ x`)
p2 <- p + geom_smooth(method='gam')
p2 + ggtitle(get.params(p2))

# specify method = 'gam' and splines for formula
p3 <- p + geom_smooth(method='gam',
              method.args = list(family = "binomial"),
              formula = y ~ splines::ns(x, 7))
p3 + ggtitle(get.params(p3))

# specify method = 'glm'
# (formula defaults to `y ~ x`)
p4 <- p + geom_smooth(method='glm')
p4 + ggtitle(get.params(p4))

# default plot for fewer observations
# (method defaults to loess & formula to `y ~ x`)
# observe that function is able to distinguish between plot data 
# & data actually used by the layer
p5 <- p + geom_smooth(data = . %>% slice(1:500))
p5 + ggtitle(get.params(p5))

情节

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM