improving plotting of probability density functions in ggplot2

Question

I am using ggplot to draw multiple known density functions, for example the gamma density function:

library(tidyverse)
apar<-c(1,2,7.5,9)
bpar<-c(2,2,1.3,0.5)
gmaxlim<-c(0, 25)
pgma1<-ggplot(data = data.frame(x = gmaxlim), aes(gmaxlim)) +
  stat_function(fun = dgamma, n = 101, args = list(shape = apar[1], scale = bpar[1]),aes(color="black")) +
  stat_function(fun = dgamma, n = 101, args = list(shape = apar[2], scale = bpar[2]),aes(color="red")) +
  stat_function(fun = dgamma, n = 101, args = list(shape = apar[3], scale = bpar[3]),aes(color="blue")) +
  stat_function(fun = dgamma, n = 101, args = list(shape = apar[4], scale = bpar[4]),aes(color="green")) +
  ylab(expression(paste("f(x|",alpha,",",beta,")"))) +xlab("x") + scale_x_continuous(breaks=seq(gmaxlim[1],gmaxlim[2], by =5)) + 
  scale_color_identity(name = "",
                       breaks = c("black", "red", "blue","green"),
                       labels = c(substitute(paste(alpha,"= ", v," ,",beta,"= ",s),list(v=apar[1],s=bpar[1])),
                                  substitute(paste(alpha,"= ", v," ,",beta,"= ",s),list(v=apar[2],s=bpar[2])), 
                                  substitute(paste(alpha,"= ", v," ,",beta,"= ",s),list(v=apar[3],s=bpar[3])),
                                  substitute(paste(alpha,"= ", v," ,",beta,"= ",s),list(v=apar[4],s=bpar[4]))),
                       guide = "legend")+
  theme_bw()
pgma1

^{Created on 2020-07-31 by the reprex package (v0.3.0)}

However this code is far from being efficient and it goes against ggplot philosophy (perhaps because we are not plotting any “real” data set?). Is there a way to write this more efficient and to be scalable to different number of pairs of parameters? I would like to have just one line of stat_function and simplify the scale_color_identity if posible. Retaining mathematical expressions in the color labels is mandatory

Answer 1

Perhaps use some lapply?

library(tidyverse)
apar <- c(1,2,7.5,9)
bpar <- c(2,2,1.3,0.5)
gmaxlim <- c(0, 25)
mycols <- c("black", "red", "blue", "green")

ggplot(data = data.frame(x = gmaxlim), aes(gmaxlim)) +
lapply(seq_along(apar), function(i){
    stat_function(fun = dgamma, n = 101, 
    args = list(shape = apar[i], scale = bpar[i]), aes( color=mycols[i]))
}) +
    scale_color_identity(name="", breaks = mycols,
    labels = lapply(seq_along(apar), function(i) 
        substitute(paste(alpha,"= ", v," ,",beta,"= ",s),
            list(v=apar[i], s=bpar[i]))), guide = "legend") +
    theme_bw()

^{Created on 2020-07-31 by the reprex package (v0.3.0)}

Answer 2

I'm a bit mystified as to why so many people try to do so much with the stat functions in ggplot instead of passing the data they actually want to plot. Using stat_function is good for drawing the odd line directly, but trying to coerce it into doing complicated stuff like drawing families of distributions by referencing external vectors just seems like doing it the hard way.

It's easier to reason about, and takes less code, to just work out what you want to plot and to plot it:

apar <- c(1, 2, 7.5, 9)
bpar <- c(2, 2, 1.3, 0.5)
x    <- seq(0, 25, 0.25)
y    <- as.vector(sapply(1:4, function(i) dgamma(x, apar[i], scale = bpar[i])))
df   <- data.frame(x = rep(x, 4), y, group = rep(letters[1:4], each = length(x)))
labs <- sapply(1:4, function(i) {
               substitute(paste(alpha,"= ", v," ,",beta,"= ",s), 
               list(v = apar[i], s = bpar[i]))})

ggplot(data = df, aes(x, y)) + geom_line(aes(color = group)) +
  ylab(expression(paste("f(x|", alpha, ",", beta,")"))) +
  scale_color_manual(values = c(1, 2, 4, 3), labels = labs) +
  theme_bw()

improving plotting of probability density functions in ggplot2

Question

2 answers

solution1
2 ACCPTED 2020-07-31 15:23:21

solution2
2 2020-07-31 15:29:53

improving plotting of probability density functions in ggplot2

Question

2 answers

solution1 2 ACCPTED 2020-07-31 15:23:21

solution2 2 2020-07-31 15:29:53

solution1
2 ACCPTED 2020-07-31 15:23:21

solution2
2 2020-07-31 15:29:53