简体   繁体   中英

improving plotting of probability density functions in ggplot2

I am using ggplot to draw multiple known density functions, for example the gamma density function:

library(tidyverse)
apar<-c(1,2,7.5,9)
bpar<-c(2,2,1.3,0.5)
gmaxlim<-c(0, 25)
pgma1<-ggplot(data = data.frame(x = gmaxlim), aes(gmaxlim)) +
  stat_function(fun = dgamma, n = 101, args = list(shape = apar[1], scale = bpar[1]),aes(color="black")) +
  stat_function(fun = dgamma, n = 101, args = list(shape = apar[2], scale = bpar[2]),aes(color="red")) +
  stat_function(fun = dgamma, n = 101, args = list(shape = apar[3], scale = bpar[3]),aes(color="blue")) +
  stat_function(fun = dgamma, n = 101, args = list(shape = apar[4], scale = bpar[4]),aes(color="green")) +
  ylab(expression(paste("f(x|",alpha,",",beta,")"))) +xlab("x") + scale_x_continuous(breaks=seq(gmaxlim[1],gmaxlim[2], by =5)) + 
  scale_color_identity(name = "",
                       breaks = c("black", "red", "blue","green"),
                       labels = c(substitute(paste(alpha,"= ", v," ,",beta,"= ",s),list(v=apar[1],s=bpar[1])),
                                  substitute(paste(alpha,"= ", v," ,",beta,"= ",s),list(v=apar[2],s=bpar[2])), 
                                  substitute(paste(alpha,"= ", v," ,",beta,"= ",s),list(v=apar[3],s=bpar[3])),
                                  substitute(paste(alpha,"= ", v," ,",beta,"= ",s),list(v=apar[4],s=bpar[4]))),
                       guide = "legend")+
  theme_bw()
pgma1

Created on 2020-07-31 by the reprex package (v0.3.0)

However this code is far from being efficient and it goes against ggplot philosophy (perhaps because we are not plotting any “real” data set?). Is there a way to write this more efficient and to be scalable to different number of pairs of parameters? I would like to have just one line of stat_function and simplify the scale_color_identity if posible. Retaining mathematical expressions in the color labels is mandatory

Perhaps use some lapply?

library(tidyverse)
apar <- c(1,2,7.5,9)
bpar <- c(2,2,1.3,0.5)
gmaxlim <- c(0, 25)
mycols <- c("black", "red", "blue", "green")

ggplot(data = data.frame(x = gmaxlim), aes(gmaxlim)) +
lapply(seq_along(apar), function(i){
    stat_function(fun = dgamma, n = 101, 
    args = list(shape = apar[i], scale = bpar[i]), aes( color=mycols[i]))
}) +
    scale_color_identity(name="", breaks = mycols,
    labels = lapply(seq_along(apar), function(i) 
        substitute(paste(alpha,"= ", v," ,",beta,"= ",s),
            list(v=apar[i], s=bpar[i]))), guide = "legend") +
    theme_bw()

Created on 2020-07-31 by the reprex package (v0.3.0)

I'm a bit mystified as to why so many people try to do so much with the stat functions in ggplot instead of passing the data they actually want to plot. Using stat_function is good for drawing the odd line directly, but trying to coerce it into doing complicated stuff like drawing families of distributions by referencing external vectors just seems like doing it the hard way.

It's easier to reason about, and takes less code, to just work out what you want to plot and to plot it:

apar <- c(1, 2, 7.5, 9)
bpar <- c(2, 2, 1.3, 0.5)
x    <- seq(0, 25, 0.25)
y    <- as.vector(sapply(1:4, function(i) dgamma(x, apar[i], scale = bpar[i])))
df   <- data.frame(x = rep(x, 4), y, group = rep(letters[1:4], each = length(x)))
labs <- sapply(1:4, function(i) {
               substitute(paste(alpha,"= ", v," ,",beta,"= ",s), 
               list(v = apar[i], s = bpar[i]))})

ggplot(data = df, aes(x, y)) + geom_line(aes(color = group)) +
  ylab(expression(paste("f(x|", alpha, ",", beta,")"))) +
  scale_color_manual(values = c(1, 2, 4, 3), labels = labs) +
  theme_bw()

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM